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Chapter  1 
Introduction 


Computer  vision  is  (levote<l  to  describiug  the  contents  of  images  obtained  b\  any  process 
that  involves  vision.  Sucli  processes  include  sensing  intensity  images  with  ('('!)  vidts) 
cameras  and  building  depth  maps  using  laser  range-finders.  Object  recognition  is  a 
subfield  of  computer  vision  whose  goal  is  to  find  known  objects  in  images,  such  as  chairs, 
machine  parts,  and  peo{)le.  Identifying  an  object  as  being  one  of  a  class,  like  "chair.” 
turns  out  to  be  very  hard.  This  is  largely  because  it  is  difficult  to  describe  ])recisely  what 
is  a  chair,  since  chairs  come  in  many  forms  and  are  identiHed  partly  by  their  shap«*  and 
partly  by  their  function.  Even  though  it  may  be  possible  to  des(  ribe  a  chair  in  terms 
of  (jualitative  properties  like  "has  a  back.”  such  descriptions  are  not  preci.se  enough  for 
computer  recognition. 

To  circumvent  this  problem,  researchers  attempt  to  recognize  s|)ecific  object. s  and. 
particularly,  objects  that  are  rigid  or  have  rigid  parts,  like  chairs  and  machine  parts: 
F  igs.  1-1  and  1-2  show  some  example  objects.  .Additionally,  researchers  assume  they  are 
given  precise  modds  for  the  objects  they  wish  to  recognize.  These  models  are  expect ('d 
to  contain  geometric  information  about  the  featurfs  on  the  objects,  such  as  corners  and 
edges.  The  information  should  include  how  the  features  are  connected  together  and 
how  they  appear  when  seen  from  different  viewpoints.  Recognizing  objects  from  such 
geometrically-defined  models  is  known  as  "model-based"  object  recognition.  Model-based 
recognition  has  been  the  paradigm  for  most  object  recognition  research,  and  will  be  for 
this  work  as  well. 

(liven  a  set  of  object  models,  the  task  is  to  determine  which  of  the  modeled  objects 
are  in  the  image,  if  any.  and  where  they  are.  If  there  are  not  many  models,  recognition 


i 


s 


CHAin  ER  I.  IM  HODI  C  I  lOS 


Figurr  1-1:  Ohjrrls  that  arr  rigid  or  liavr  rigid  ])arts 


F  iguro  1-2;  Objects  that  are  rigid  or  have  rigid  parts 
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can  i^roceed  by  looking  for  the  objects  one  at  a  time.  Even  if  there  is  only  one  olcject  to 
rcH'ognize  as  being  present  or  not.  there  are  several  factors  that  make  doing  so  difhcnlt. 
The  tirst  is  that  an  object  a|)pears  differently  depending  on  what  viewpoint  it  is  seen  from, 
and  every  appearance  of  the  object  corresponding  to  some  viewpoint  must  be  recognizc'd 
as  an  instance  of  the  object.  In  addition,  the  features  in  images  corresponding  to  the 
model  contain  error,  due  to  artifacts  such  as  inaccuracies  in  tlie  imaging  process,  the 
effects  of  illumination,  and  ambiguities  in  feature  locations.  For  instance,  in  Fig.  1-1 
the  poor  focus  and  lighting  make  it  difficult  to  see  the  .'ID  shape  of  the  bookend.  and 
the  telephone  edges  that  surround  the  keypad  are  several  pi.vels  wide.  Furthermore,  the 
object  of  interest  may  be  partially  occluded,  or  may  be  difficult  to  discern  Ixeause  other 
objects  or  the  background  look  similar  to  it.  For  e.vample.  in  Fig.  1-2  the  telephone  is 
partially  occluded  b\-  the  clamp  and  the  flashlight,  and.  in  addition,  the  back  edges  of 
the  phone  blend  in  with  the  white  background. 

One  |)opular  approach  to  model-based  recognition  that  attempts  to  account  for  these 
problems  is  the  "alignment”  method,  as  described  by  Huttenlocher  and  rilman  [Hut- 
tenloclierSS]  [Huttenlocher90].  The  general  idea  of  alignment  is  to  break  the  recognition 
process  into  two  stages.  The  first  stage  uses  limited  information  to  hi/jw/h(.sh(  \iew- 
|)oinls  from  where  an  object  might  have  been  .seen.  For  each  viewpoitit.  the  .second  stage 
computes  how  the  model  would  appear  in  the  image  if  .seen  from  that  viewpoint,  and 
then  e.xamines  the  image  to  verify  if  the  corresponding  hyi)othesis  is  correct. 

Briefly,  the  alignment  approach  uses  the  following  mechanisms  to  address  the  prob¬ 
lems  mentioned  above.  To  handle  the  fact  that  any  view  of  the  object  could  appear 
in  the  image,  the  method  tries  all  possible  minimal  sets  of  matches  between  model  and 
image  features  for  hypotheses,  where  a  minimal  set  contains  just  enough  matches  to 
compute  the  viewi)oint  from  which  the  model  was  seen.  To  account  for  error  in  the 
image  features,  verification  is  performed  by  checking  that  the  predicted  appearance  of 
the  model  matches  the  image  only  apj>io.\imately.  The  problem  of  occlu.sion  is  handled 
by  generating  hypotheses  using  feature.s  from  the  model  and  the  image  that  are  robust 
to  partial  occlusions,  such  as  corner  points  and  pieces  of  line  segments.  To  deal  with 
spurious  feature.s  that  arise  from  other  objects  and  from  the  background,  a  bottom-up 
process  is  assumed  that  groups  together  image  features  that  are  likely  to  come  from  the 
same  object. 

There  are  two  major  problems  with  these  mechanisms,  however.  First,  the  types  of 
features  used  for  generating  hypotheses  are  easily  confused  with  similar  features  from 
other  objects,  from  shadows,  and  from  the  background.  For  e.xample.  any  un-mod('led 
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object  in  Fig.  1-1  or  Pig.  1-2  can  contribute  spurious  line  segments,  as  can  the  wood- 
grain  on  the  table  in  P^ig.  1-1  and  the  highlights  on  the  pencil  sharpener  in  Fig.  1-2. 
This  may  place  an  e.xcessive  burden  the  grouping  process,  or  alternately,  may  lead  to  a 
combinatorial  e.xplosion  in  the  number  of  minimal  sets  of  matches  to  be  verified. 

The  second  problem  is  with  the  method  used  to  account  for  error  in  the  locations  of 
image  features.  The  problem  is  that  the  error  can  propagate  and  magnify  through  the 
computations  of  the  viewpoint  and  the  api)earance  of  the  model  from  the  viewpoint.  .\s 
a  result,  the  predicted  appearance  of  the  model  may  not  be  api)ro\imately  th<>  same  as 
the  image,  but  instead  can  be  very  different. 


1.1  Problem  Definition 

In  light  of  the  mentioned  problems,  the  objective  of  this  thesis  is  to  incorporate  error 
analysis  into  alignment -sty h'  recognition  and  use  the  error  analysis  to  show  how  to  build 
an  alignment  system  that  is  robust  and  efficient.  .As  suggested  above,  the  system  is 
intended  to  recognize  a  restricted  but  wide  class  of  3D  objects,  specifically,  rigid  objects 
with  sharp  edges.  The  object. •>  are  represented  by  a  set  of  geometric  models,  which  are 
described  in  the  next  section.  For  simplicity,  the  system  works  with  a  stnall  .set  of  objects, 
so  they  can  be  dealt  with  se(|uentially.  As  input,  the  .system  is  given  a  2D  intensity  image, 
which  may  contain  instances  of  the  modeled  objects.  The  goal  is  find  all  instances  of  the 
modeled  objects  in  the  image,  or  ekse  .state  that  none  are  there. 


1.2  Representation 

For  models,  the  system  expects  to  be  given  three  data  sets  for  each  object:  (1)  a  list 
of  distinguished  points  (corners,  maxima,  minima,  and  zeros  of  curvature).  (2)  a  list  of 
extended  edge  features  (line  segments  and  curve  segments),  and  (3)  a  complete  edge 
description.  1  he  first  data  set  is  for  generating  hypotheses,  the  .second  for  checking  them 
(piickly.  and  the  third  for  verifying  them  carefully. 

The  third  data  set.  the  complete  edge  description,  should  consist  of  a  small  number 
of  point-by-point,  viewer-centered.  3D  edge  maps.  The  edge  maps  can  be  obtained 
automatically  from  edge-based  stereo  or  motion,  or  by  a  laser  range-finder  with  a  3D  edge 
detector.  In  addition  to  shape  information,  the  edge  maps  should  include  information 
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about  surface  markings,  which  show  up  in  intensity  images.  .At  tliis  stage,  there  should 
be  little  thrc^sholding  on  the  magnitudes  and  lengths  of  edges.  The  edge  descriptions  shall 
be  den.sc*  and  noisy,  but  should  contain  enough  useful  information  so  that  th<>  object  is 
easily  identiiied. 

About  five  views  of  each  object  jjrobably  arc*  sufficient,  in  orchr  to  make  sure*  at  lc>asl 
one  view  covers  every  part  of  the*  object.  More  views  will  be  nc'cessary  if  an  objf»ct  has 
concavities  that  can  only  be*  seen  from  a  few  viewpoints;  still,  in  this  case  stnall  images 
should  be  sufficient  to  re|)resent  these  aspc*cts.  and  .so  not  much  additional  storage  is 
rc'ciuired.  In  order  to  prc'dict  t  he  app<*arance  of  the  object  from  a  novel  vic'wpoint.  it  may 
be*  necessary  to  ccunbine  information  Iron)  diffoc'nt  views.  A  simple  way  to  do  this  is  to 
p)-ojc*ct  the  entire  contents  of  all  the  nearby  views.  I’he  main  pioblem  with  this  is  that 
the  geneiated  view  may  contain  edges  that  should  not  be  visible*.  On  the  other  hand,  if 
the  edge  maps  contain  sufficient  dD  information  for  eliminating  most  hidden  <*dges.  then 
this  pi'oblem  will  be  minor. 

The  second  data  set.  the  e.Ktended  c*dg<*s.  can  be  obtain(*d  by  fitting  I'elatively  long 
straight  lines  and  curves  to  the  :fD  edge  fna()s.  The  purpose  of  this  stage*  is  to  find  n)odel 
featui’es  that  indi\  ieluall\-  aie  u.seful  for  identifying  the  object.  ('omi>aied  to  the  density 
of  a  eomplete  edge  map.  theie  will  be  very  few  of  this  type  of  fe*at  )n<'. 

For  the*  e*xtendeel  ('dgc's.  the  repi’esentation  is  expe*cteel  to  be*  object-centeied.  which 
means  that  features  shared  by  eliffei’ent  views  niust  be  combined.  .Also,  the  viewpe)ints 
fi’om  which  t  he  features  were  se*en  sheuilel  be  steueel  with  t  he  fe*at  uie*s.  so  that  sedf-occlusieui 
can  be  laigely  accounted  for.  Combining  featine*s  fiom  diffeient  views  may  not  be  e>asy 
uide*ss  the  views  aie  well-registered.  Even  if  they  ai<»  not.  it  is  possible  to  do  this  stej)  by 
hand,  since  niodel-building  is  off-line  and  the)e  aie  not  many  extended  features. 

.As  a  note  on  smoothly  curved  objects,  the  silhouettes  should  not  be  used  in  obtaining 
the  extended  features,  although  they  may  give  strong  edges.  The  leason  is  that  the* 
silhouettes  of  smooth  objects  are  not  sfabR.  that  is.  they  can  change  as  the  oliject  lotates. 
The  extended  features,  on  the  other  hand,  aie  object-centeied  and  ma\'  be  seen  from 
widely  different  view])oints. 

Finally,  the  features  for  first  data  set.  distinguished  points,  can  be  extracted  from 
the  extended  edges,  either  by  intersecting  lines  or  by  finding  extrema  and  inflection 
Ijoints  on  curves.  There  is  a  separate  data  set  containing  point  features  because  they  are 
straightforward  to  match  between  a  model  and  an  image.  In  contrast,  extended  features 
are  likely  to  be  broken  up  by  the  feature  detector  or  be  partially  occluded.  The  particular 
point  features  used  here  (corners,  extrema,  and  inflection  points)  were  chosen  because 
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ey  are  .^tablf.  i.e..  can  be  identified,  under  projection  over  a  wide  range  of  views  (Note 
onble  meaning  "stable"  here  and  above.) 


1.3  Approach 

The  algorithm  for  recognition  that  I  propose  is  as  follows,  and  is  an  extension  of  Hutten- 
locher  and  Ullman's: 

1.  Form  groups  of  image  and  model  features,  and  extract  distinguished  points  from 
these  groups. 

2.  Until  there  are  no  remaining  pairs  of  triples.  hv])othesize  a  correspondence  between 
three  grouped  model  points  and  three  grouped  image  points. 

(a)  Compute  the  3D  pose  of  tlie  model  from  the  three-point  correspondence. 

(1))  Predict  the  image  positions  of  the  extended  features  of  the  model  using  the 
3D  pose. 

(c)  Cliven  the  error  in  the  image  points,  compute  a  region  of  uncertainty  for  each 
predicted  model  feature  that  bounds  the  range  of  locations  where  the  feature 
could  actually  lie. 

(d)  .Assign  the  three-point  hypothesis  a  likelihood  based  on  the  uncertainty  re¬ 
gions.  using  a  Bayesian  inference  mechanism. 

(e)  If  the  likelihood  is  high,  select  the  edge  maps  of  the  model  that  were  imaged 
from  nearby  viewpoints.  Then  perform  a  careful  verification  by  transforming 
and  jrrojecting  all  of  them  into  the  image,  which  reciuires  merging  edges  that 
are  the  same  and  eliminating  edges  that  are  hidden.  Using  uncertainty  prop¬ 
agation  as  a  guide,  count  how  much  of  the  projected  model  contour  occurs  in 
the  image. 

(f)  If  the  hypot  hesis  verifies,  remove  all  the  distinguished  image  points  that  have 
been  accounted  for  by  the  model  from  the  current  set  of  distinguished  image 
points. 


3.  Return  the  verified  hypotheses. 
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In  this  algorithm,  triples  of  feature  points  are  used  to  form  livpotheses  and  compute 
poses.  When  such  features  are  obtained  from  an  image,  they  often  come  with  local 
orientation  information,  which  this  algorithm  does  not  make  use  of.  For  example,  a 
feature  point  that  is  actually  a  corner  might  come  with  the  line  segments  that  were* 
intersected  to  find  the  corner,  or  a  feature  point  that  is  actually  a  maximum  or  minimum 
point  on  a  curve  segment,  might  come  with  an  estimate  of  the  tangent  vector  at  that 
maximum  or  minimum.  This  information  could  in  theory  be  used  to  furtlier  constrain 
the  |)ose.  or  to  do  indexing  (which  is  discu.ssed  below),  .so  that  all  possible  corresponding 
model  features  would  not  have  to  be  e.xamined.  .Although  the.se  steps  would  be  worthwhile 
if  properly  done,  it  should  be  noted  that  local  point  features,  even  with  orientation 
information,  are  not  very  distinguishing.  Indexing  with  such  features  would  provide  a 
useful  preprocessing  step,  but  the  brunt  of  the  recognition  problem  would  still  remain: 
so  this  is  what  the  above  algorithm  concentrates  on. 

For  the  careful-verification  stage  (step  2e).  the  edge  maps  that  were  imaged  from  the 
nearest  viewpoints  are  used  to  predict  the  appearance  of  the  model  in  the  image.  For 
most  edges,  this  is  done  by  transforming  and  projecting  the  edge  map  point-by-point. 
But  for  edges  on  the  silhouettes  of  smooth  objects,  this  does  not  work,  since  the  bounding 
contour  changes  even  for  small  rotations.  For  these  situations.  Basri  and  Fllman  have 
suggested  a  method  that  can  be  used  to  bring  the  silhouettes  into  the  image  when  the 
change  of  viewpoint  is  not  too  large  [Basri88]. 

There  are  two  basic  differences  between  the  algorithm  listed  above  and  Huttenlocher 
and  Ullman's.  First.  Huttenlocher  and  Ullman's  method  has  no  formal  notion  of  uncer¬ 
tain. y  in  the  feature  data,  whereas  here  handling  uncertainty  formally  is  an  integral  part 
of  the  algorithm.  This  is  necessary  because  a  small  perturbation  in  a  few  point  features 
can  lead  to  a  very  different  appearance  of  the  model  in  the  image.  .Although  this  situation 
could  be  avoided  by  choosing  points  that  are  far  apart  on  the  object,  current  grouping 
systems  tend  to  locate  points  that  are  nearby,  Consecpiently.  .sets  of  nearby  points  arise 
often  and  would  cause  a  system  that  deals  with  error  in  an  ad  hoc  way  to  break. 

The  second  difference  from  Huttenlocher  and  Ullman's  method  is  the  use  of  Bayesian 
inference  to  throw  away  hypotheses  that  are  unlikely.  Huttenlocher  and  Ullman  used  a 
heuristical  approach  to  prune  hypotheses  quickly.  In  contrast,  the  method  here  will  be 
derived  from  first  principles,  using  knowledge  of  how  uncertainty  propagates.  .As  a  result, 
the  method  here  is  expected  to  be  considerably  more  reliable. 


l.l.  BACKGROUND 
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1.4  B  ackground 

The  algorithm  described  in  the  preceding  section  for  recognizing  three-dimensional  ob¬ 
jects  grew  out  of  a  number  of  approaches  attempted  in  tlie  past.  Perhaps  the  best  way 
to  argue  for  its  efficacy,  then,  is  to  present  the  development  that  led  to  its  selection. 

To  begin,  let  us  consider  the  choice  of  features  for  generating  hypolhe.ses.  Early 
attemjits  used  relatively  large  features  to  obtain  an  initial  match  between  a  model  and 
an  image.  Examples  of  such  features  include  convex  polygons  [RobertsGo],  projections 
of  generalized  cones  [BrooksSl]  [Biederman85].  and  moments  of  inertia  of  closed  regions 
[C'yganski85]  [Keeves89].  The  advantages  of  large  features  are  that  there  are  only  a 
few  of  them  in  an  image  and  they  have  few  matches  in  the  model.  There  is.  however,  a 
fundamental  problem  with  these  types  of  features:  They  are  sensitive  to  partial  occlusions 
and.  as  a  result,  cannot  be  extracted  reliably  from  images. 

To  avoid  this  problem,  it  is  common  for  recognition  systems  to  extract  small,  local 
features,  such  as  points  and  line  segments,  and  then  to  group  these  together  to  get  sets 
of  features  from  the  same  object  [Clark79]  [Bolles82]  [Bolles83]  [Lowe85]  [ThompsonST] 
[HoraudST]  [Linainmaa88]  [Lamdan88a.]  [Huttenlocher90].  Although  many  systems  look 
only  for  small  groups  of  features,  some  of  them  try  to  find  large  ones.  Finding  large  groups 
is  a  distinct  problem  and  has  received  much  attention  [Lowe85]  [Jacobs87]  [Mohan88] 
[Horaud90]  [Jacobs92].  .As  with  large  features,  the  advantage  of  large  groups  of  features 
is  that  there  are  few  of  them  in  the  image  and  the  model.  Ideally,  a  system  would  gather 
large  groups  of  features,  use  them  to  index  into  a  model  database,  and  pull  out  exactly 
those  models  that  contain  ftaii’''es  that  can  project  to  the  features  in  that  group.  This 
approach  could  lead  to  verr'  fas;  lecognition.  and  has  been  examined  for  point  features 
[.]acobs92]. 

Despite  the  potential  gains  from  grouping  and  indexing  with  large  groups,  realistically 
the  chances  are  that  groups  will  contain  spurious  features  and  be  missing  correct  features, 
and  partial  occlusions  will  make  this  problem  much  worse.  In  order  to  minimize  the 
chance  of  having  spurious  features  in  groups,  most  systems  look  for  groups  that  are 
small,  though  large  enough  to  determine  the  pose  of  the  model  with  respect  to  the  data. 
For  example,  these  groups  can  be  pairs  of  corners  [Thompson87l,  three  line  junctions 
[Horaud87].  triples  of  points  [Linainmaa88]  [Lamda.u88a]  [Huttenlocher90].  and  triples  of 
lines  [Clark79]  [Lowe85]. 

Even  if  only  small  groups  are  available,  indexing  is  needed  to  rapidly  handle  small  to 
medium-sized  libraries  of  objects.  Given  small  groups,  a  model  index  table  can  be  built 
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such  that  all  model  groups  which  could  produce  a  given  image  group  can  be  immediately 
extracted.  For  groups  containing  triples  of  points,  any  model  point  triple  can  produce 
any  image  ])oint  triple  under  projection  [FischlerSl]  [MuttenlocherhO].  and  so  indexing 
cannot  help.  For  image  groups  with  corners  and  junctions,  the  corresponding  model 
groups  will  !)(>  somewhat  constrained,  but  not  substantially,  since  even  these  groups  are 
not  very  distinguishing. 

Still,  it  is  possible  to  use  the  idea  of  Cleometric  Hashing  to  gain  more  power  from 
indexing  [Lamdan88b].  To  ap[)ly  (Jeometric  Hashing  to  30  recognition  from  2D  images, 
hrst  project  each  model  orthographically  from  all  different  points  on  a  viewing  sphere 
to  reduce  the  problem  to  identifying  flat  models.  Then,  for  each  projection,  take  every 
triple  of  model  points  and.  with  respect  to  each  triple,  store  coordinates  of  all  the  other 
model  points  in  an  index  table,  along  with  the  model  triple,  the  viewpoint,  and  the 
model.  .-\t  recognition  time,  take  every  triple  of  image  points  and.  with  res])ect  to  each 
triple,  use  the  coordinates  of  every  other  image  point  to  index  into  the  table  and  pull 
out  all  the  model  triples  with  tho.se  coordinates.  To  make  the  proce.ss  more  reliable,  the 
look-uj)  table  should  be  built  with  points  drawn  from  groui)s  in  the  model  and  indexed 
with  points  drawn  from  large  groups  in  the  image. 

.Although  performing  indexing  this  way  may  often  provide  considerable  filtering  of  hy- 
])Otheses,  it  often  will  not.  since,  as  mentioned,  point  fealurt's  are  not  very  distinguishing. 
The  problem  is  that  small  sets  of  point  features  are  easily  confused  with  randomly-placed 
points  when  there  is  a  significant  amount  of  clutter  in  the  scene,  which  means  that  “false 
positives"  are  likely.  ([Grimson92b]  gives  an  analysis  of  the  likelihood  of  false  ])ositives 
in  Geometric  Hashing  for  flat  objects.)  Tlie  chance  of  false  positives  is  further  increa.sed 
by  taking  all  views  of  the  model  and,  for  each  view,  using  all  triples.  As  a  consequence 
of  false  positive  problems,  an  indexing  .system  should  be  backed  up  with  a  system  that 
tries  all  possible  correspondences  of  model  and  image  groups  to  find  an  initial  match. 

It  may  seem  like  a  lot  of  work  to  try  ail  possible  corres])onding  model  and  image 
groups.  For  point  features  matched  l)etween  a  30  model  and  a  20  image,  for  instance, 
the  minimal  size  of  a  group  is  three  [FischlerSl]  [Huttenlocher90]:  so  using  points  means 
trying  all  pairs  of  model  and  image  triples,  which,  for  tv  model  points  and  image  points, 
is  an  process.  IVevertheless.  consider  again  instead  the  possibility  of  using  large 

grouj)s.  Recall  that  the  trouble  wdth  large  groups  is  they  are  not  reliable  enough  to  do 
indexing.  Instead  of  using  the  groups  for  indexing,  we  can  select  triples  of  points  from 
them,  as  was  suggested  for  Geometric  Hashing.  The  idea  is  that  an  unreliable  grouf) 
from  the  correct  object  should  have  at  least  three  correct  points.  This  would  reduce  the 
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number  of  triples  to  try  to  a  managealrle  level,  because  in  would  be  the  average  number 
of  points  in  a  model  group  and  ,s  would  l>e  the  average  number  of  points  in  an  image 
grou]).  .\s  a  result,  although  the  asymptotic  complexity  of  )  is  considerable,  with 

grouping  we  can  expect  the  number  of  possibilities  iti  practice  to  be  small  enough  that  it 
will  be  the  constant  time  for  checking  a  single  match  that  decides  whether  the  method 
is  feasible.  More  generally.  thi‘<  is  the  argument  that  for  real  recognition  problems,  the 
constant  factors  often  make  the  difference  in  whether  an  algorithm  is  efficient  or  not 
[CrimsonffOa]. 

For  the  reasons  mentioned,  bootstrapping  recognition  by  considering  all  pairs  of  mini¬ 
mal  sets  of  features  is  very  popular.  Since  a  minimal  set  of  matched  features  is  insufficient 
to  identify  an  object,  the  minimal  sets  are  used  to  find  larger  sets.  Most  techniques  that 
do  this  can  be  divided  into  two  broad  classes,  constrained  search  and  transform  cluster¬ 
ing.  (’onstrained  search  starts  from  each  minimal  hypothesis  and  repeatedly  uses  the 
current  set  of  matches  to  constrain  the  search  for  an  additional  match,  until  a  large  set 
of  matches  is  found  [Clark79]  [Brooks81]  [Bolles82]  [Goad83]  [Grimson84]  [Lowe85]  [.4y- 
ache8fi]  [Horaud87].  Transform  clustering,  on  the  other  hand,  uses  every  correspondence 
between  a  minimal  set  of  model  and  image  features  individually  to  compute  a  model- 
to-image  transformation,  and  then  counts  the  number  of  times  each  transformation  is 
repeated  [Ballard81]  [Turney85]  [Thompson87]  [Linainmaa88]  [Cass90]. 

It  is  informative  to  review  the  motivations  behind  these  two  classes  of  recognition 
techniciues.  The  idea  of  constrained  search  i.s  clear,  namely,  to  use  a  set  of  known  matches 
to  find  more  matches.  Due  to  uncertainty  in  the  positions  of  the  features,  however, 
this  process  can  be  difficult  ,  since  for  each  unmatched  nrodel  feature  there  typically  are 
several  image  features  to  which  it  can  match.  To  handle  this  reliably,  many  systems  use 
an  extensive  backtracking  search  [Bolles82]  [Goad83]  [Grimson84]  [Horaud87]. 

The  transform  clustering  approach  avoids  extensive  search  by  noting  that  each  cor¬ 
rect  match  will  independently  vote  for  the  correct  transformation,  so  we  can  just  let 
all  the  matches  vote  and  then  take  the  transformation  with  the  most  votes.  Usually, 
this  method  is  implemented  by  dividing  transformation  space  into  buckets,  having  each 
match  increment  a  counter  in  a  bucket,  and  searching  the  space  for  the  buckets  with  the 
highest  counts.  In  this  form,  the  method  is  known  as  the  "generalized  Hough  transform” 
[Ballard81]. 

.'\s  a  recognition  technique,  the  generalized  Hough  transform  has  .several  difficulties; 
( 1 )  Unless  the  buckets  are  %  erv  small,  the  bucketing  can  lead  to  false  peaks  (false  positives) 
in  transformation  space,  since  buckets  combine  together  different  transforms.  (2)  .4  more 
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serious  difficulty  is  that  for  3D  recognition  from  20  images,  a  transformation  has  six 
degrees  of  freedotn.  which  implies  transformation  space  is  six-dimensional;  sucli  a  space 
is  impractical  to  store  arul  to  search.  For  this  rea.son.  these  systems  do  not  perform 
a  full  Hough  transform,  btit  instead  use  separate  spaces  for  subsets  of  the  parameters. 
The  effect  of  using  separate  spaces  is  to  increa.se  again  the  likelihood  of  false  positives. 

(3)  .An  additional  problem  is  that  the  local  features  that  typically  are  u.sed  to  compute 
transformations  between  2D  images  and  3D  models  are  prone  to  being  confused  with 
spurious  features  in  the  image,  which  also  can  make  the  method  prone  to  false  positives. 

(4)  Furthermore,  it  is  difficult  to  handle  uncertainty  in  the  image  features  used  to  compute 
the  transformation.  Crimson  et  al.  provide  a  way  of  obtaining  bounds  on  the  uncertainty 
in  transformation  space  [Crimsonf)2a],  but  such  overestimates  further  iuerea.se  the  fa! 
positive  probability.  (See  [CrimsonDOo]  for  an  analysis  of  the  false  positive  rates  involved 
with  applying  the  generalized  Hough  transform.) 

For  the  reasons  mentioned,  a  more  reasonable  use  of  the  Hough  transform  is  as  a  coarse 
filter  to  produce  sets  of  possibly  corresponding  model  and  image  features  [GrimsonST]. 
Such  a  stage  could  help  considerably  when  looking  for  matches  between  the  model  and 
an  entire  image,  but  it  may.  however,  not  be  useful  if  effective  grouping  is  available. 

The  preceding  techniques  do  a  lot  work  after  they  are  given  an  initial  match  in  order  to 
find  a  large  set  of  matches.  Intuitively,  this  .seems  peculiar,  since,  up  to  .some  uncertaint\ 
in  the  data,  the  initial  match  determines  the  po.se  of  the  model.  It  would  seem.  then, 
that  the  preceding  techniques  are  just  pinning  down  the  model  pose  more  precisely.  .As 
mentioned  above,  the  reason  this  process  is  difficult  is  that  each  predicted  model  feature 
potentially  matches  a  number  of  image  features.  Nevertheless,  to  re.solve  this  ambiguity  it 
may  not  be  necessary  to  resort  to  constrained  search  or  transform  clustering.  Instead,  the 
ambiguity  could  be  resolved  for  all  model  features  simultaneously,  by  physically  mo\  ing 
them  in  unison  around  the  image.  This  can  be  done  by  moving  the  matched  image 
features  around  their  error  regions,  while  continually  updating  the  image  locations  of 
the  predicted  model  features.  The  predicted  model  features  are  moved  until  a  position 
is  found  that  consistently  matches  most  of  them  to  within  the  error  regions  of  image 
features.  This  method  is  equivalent  to  the  currently-used  techniques,  in  the  .sen.se  that  it 
will  find  the  same  set  of  consistent  matches.  At  the  sanie  time,  it  should  avoid  the  .search 
through  correspondence  space  or  transformation  space  that  they  incur. 

Another  way  to  improve  on  earlier  recognition  methods  is  to  make  use  of  the  fact 
that  once  the  pose  of  the  object  is  known,  in  theory  the  object's  entire  appearance  in 
the  image  can  be  predicted.  That  is,  complete  edge  maps  could  be  used,  instead  of  just 
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a  sparse  set  of  features.  I  his  oliservatioii  is  llie  basis  of  I'llinaii  s  idea  of  using  pictorial 
descriptions  to  recognize  objects  [( ’llinanS9].  and  is  one  of  the  main  ideas  lK*liind  tlie 
recognition  system  liuilt  by  llutteidocher  and  I’llman  [Hutteniocher88]  [Huttenlocher90]. 

.Although  consideral)ly  more  accurate  recognition  can  be*  achieved  if  complete  edge 
maps  are  u.sed  for  verification,  the  ex])ense  in  timeof  sucli  an  extensive  verification  would 
be  prohibitive  if  it  had  to  be  done  for  all  liypotheses.  Instead,  it  is  jmssible  to  first  use  the 
set  of  sparse  model  features  to  filter  out  a  large  percentage  of  the  hypotheses.  Importantlx'. 
we  can  do  this  without  resolving  for  each  predictetl  model  feature  to  which  of  its  lu'arln 
image  features  it  corresponds.  Specifically,  we  can  compute  a  Bayesian  estimate'  of  the 
[uobability  that  a  hypothesis  is  correct  given  the  situation  in  the  image  (see  Chapter  (i). 
riien.  once  most  of  the  hypotheses  have  been  filtered,  careful  verification  using  complete 
edge  maps  can  be  performed. 

In  sum.  a  viable  ap[)roach  to  recognition  begins  by  locating  large  groups  of  local 
features  in  the  image  and  the  model.  Then  hypotheses  can  be  formed  1)\-  .seh'cting 
triples  of  points  from  the  groups  and  tnatching  them.  These  matches  are  first  checked 
quickly  using  Bayesian  inference  to  decide  how  likely  they  are.  Tlten  they  are  verified 
carefully  using  detailed,  viewer-centered  edge  maps.  .Al.so.  this  careful  verification  should 
be  augmented  to  account  for  uncertainty  in  the  data  by  try  ing  various  projections  of  the 
model. 

1.5  Overview 

Section  1.3  gave  an  algorithm  for  performing  alignment -based  recognition.  The  major 
modules  of  the  proffered  alignment  algorithm  are  ( 1 )  grouping.  (2)  3D  pose  comi)utation 
and  alignment  of  model  features.  (3)  computing  the  likelihood  of  a  hypothesis,  and  (f) 
careful  and  accurate  verification. 

This  thesis  focu.ses  on  the  .second  and  third  modules.  In  the  alignment  algorithm, 
these  modules  constitute  steps  2a-2d.  The  reason  the  grouping  stage  is  not  studied 
is  that,  as  noted  earlier,  it  is  a  distinct  problem  which  is  receiving  much  attention  in 
the  literature.  By  showing  how  to  build  a  reliable  recognition  system  indej)endent  of 
grouping,  we  may  be  able  to  infer  how  much  is  expected  from  a  grouping  stage  in  terms 
of  reliable  groups. 

The  fourth  module  is  also  a  distinct  jwoblem.  because  it  uses  a  different  representation 
than  the  second  and  third  modules  use.  In  particular,  the  second  and  third  modules  use 
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sparse.  ()f)jert -rent erecl  features,  sueli  as  points  aiul  line  segments,  to  generate  liypotbe.ses. 
to  (■omi)ute  poses,  and  to  assign  likelihoods.  In  contrast,  the  lourth  module  uses  detailed, 
viewer-centered  edge  maps  to  perform  careful  xerification.  In  fact,  the  first  three  modules 
coiu])rise  an  alignment  system  hy  themselves,  since  lor  many  ohjr'cts  a  s|)arse  set  ol 
extended  features  is  sufficient  to  identify  them. 

Chapter  2  gives  a  new  method  for  computing  21)  pose  from  three  corresi)onding 
points  ami  aligning  a  model  to  an  image.  The  method  is  intended  to  l)e  faster  than 
earlier  approaches,  which  is  im[K)rtant  because  tlie  pose  computation  is  repealed  many 
times.  In  addition,  the  solution  is  proxed  to  be  correct  and  is  explained  geometrically. 
Furthermore,  earlier  solutions  to  the  problem  are  |)re.sentfxi  and  compared.  In  addition, 
the  stabilities  of  l)oth  the  nexv  and  earlier  solutions  are  analxzed. 

Chapter  2  shows  hoxv  to  com|)ute  uncertainty  regions  for  j)oinl  features,  and  their 
.select ivities.  Computing  the  uncertainty  regions  cpiickly  depends  critically  on  the  fast 
2D  po.se  computation  of  Chapter  2.  (’hapter  4  extends  the  analysis  to  line  segments. 
Chapter  o  discusses  hoxv  to  use  the  expected  select  ivities  for  deriving  formal  thresholds 
for  xerificat ion  atid  for  deciding  hoxv  much  .scene  clutter  is  acceptable.  Chaj>ter  f)  derixes 
a  measure  for  ranking  the  hypotheses,  using  the  selectix  ity  formulas  of  Chapters  2  and  1. 
Lastly.  Chapter  7  is  the  conclusion,  and  Chapter  8  mentions  future  xvork. 


Chapter  2 

3D  Pose  from  3  Points  using 
Weak-  Per  spect  i  ve 

This  chapter  gives  a  new  met  hod  for  performing  ste])s  2a  and  2i)  of  the  alignment  algo¬ 
rithm  (Section  1.2).  Specifically,  this  chai)ter  shows  how  to  compute  the  .21)  i)ose  of  a 
model  from  three  corresponding  model  and  image  points  (step  2a).  and  how  to  ns«'  tiu- 
l)ose  solution  to  compute  the  image  position  of  an\’  nnmatched  model  point.  For  step 
21).  the  image  positions  of  the  extended  model  features  can  1)(*  computed  using  i)oints. 
like  the  endpoints  of  a  line  segment.  In  addition,  the  next  chai)ter  shows  that  tin*  so¬ 
lution  for  the  image  position  of  an  unmatched  model  point  is  also  useful  for  step  2c  of 
the  alignment  algorithm,  in  which  the  uncertainty  regions  for  the  |)redicted  model  fea¬ 
tures  are  computed.  More  generally,  the  pose  solution  is  useful  for  many  approaches  to 
object  recognition,  such  as  constrained  search  and  transform  clustering  (pose  clustering) 
(Section  1.4).  This  is  because  these  approaches  frequently  use  correspondences  between 
minimal  sets  of  model  and  image  features  to  compute  i)oses  of  the  model. 

For  computing  poses  of  2D  objects  from  2D  images,  a  model  of  projection  must 
be  selected,  and  typically  either  perspective  or  "weak-[)er.spect ive"  projection  is  chosen. 
Weak-perspective  projection  is  an  orthographic  projection  plus  a  scaling,  which  .ser\e.s 
tc  approximate  pers|)ective  projection  by  assuming  (hat  all  points  on  a  2D  object  are  at 
roughly  the  same  distance  from  the  camera.  The  justification  for  using  weak-perspecti\e 
is  that  in  many  cases  it  ap])roximates  persy)ective  closely,  in  particular  if  (he  size  of 
the  model  in  depth  is  small  compared  to  the  depth  of  (he  model  centroid.  For  both 
perspective  and  weak-perspective,  the  minimal  number  of  points  needed  to  comj)ute  a 
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model  |)i>se  up  to  a  liiiitc'  mnnl)er  ol  solutions  is  three  [I'ischlerSl]  [Hut teuloelierttd] .  l  oi 
pciint  leatuies.  then,  tlu'  |)rol>lem  is  to  detenniiK'  the  post*  ol  thrt'e  pttiuts  in  spate  gixeii 
thret*  eorres|)oiidiug  imagt*  ])oiuts.  W'lu'ii  pt*rspeet  ive  pn)je(tion  is  the  imaging  motlel. 
tilt*  |)rol)l('m  is  kiutwn  as  the  "j)erspeet i\<' t liiet*-pt)int  prohlem'  [FisehlerSl].  \\  lien  weak- 
pt'ispeetive  is  used.  1  shall  t  all  the  (trohh'm  the  "wt'ak-jXTspeet ive  t hiet‘-|)oint  itrohlem. 

.\lt hough  pt'i'speet ivt*  (ct'iitial)  projection  is  a  nnm*  accuratt*  model,  inmu'rous  re- 
searcht'rs  hast*  list'd  weak-pt'rspect  ive  projection  insteail  [Hohertsdo]  [KanadeSd]  [('\  gan- 
skiS")]  [( 'vganskiOrrSS]  [  riiompsouST]  [I’llmanSti]  [I  llmanS!)]  [LamdanSSa]  [LamdanSSh] 
[IlnttenlochenSSj  [HasriSS]  [Hutft'nltx  herlH)]  [I  llmantH]  [.JacohsfH]  [( IrimstintCda]  [(Irim- 
sonh'il)].  I  ht*  rt'ason  is  that  t  hert*  are  sonit*  atlvantagt's  tt)  using  \veak-pt'rs|)t'ct i ve  instead 
of  |)erspt'ct  i\t'.  In  part  icular.  com|)ut at  itins  invok  ing  weak-ix'ispect  ive  oltt'ii  are  It'ss  com- 
|)licatt'd.  In  addition,  the  weak- pers pec t ive  math  motlt'l  is  ct)nct'])t  ually  simpler,  sinte  it 
list's  t)rt  hograiihic  instt'ad  of  ix'isjx'ct  ivt*  |)roj<*ct ion.  .Anotlx'r  advantage  is  that  we  ilti 
not  ix't'd  to  know  llit*  camt'ra  ftxal  h'ligth  or  ct'iitt'r  point,  luirtht'i.  tlit*  eilect  on  oh- 
jt'ct  rt'cognition  of  t'lrors  in  tht*  imagt*  {xxiits  has  ht't'ii  studied  t)nl\’  for  wt'ak-jx'rsjx'ct ive 
projt'ction  [CostahO]  [.lacohslH]  [kamdaiiDl]  [Higoiit sos!)  1  ]  [( Irimsonh’Ja]. 

This  chapter  provitles  a  nt'W  apjiroach  to  rt'covt'ring  tlx*  |x)st'  lor  wt‘ak-p<'rs|x'ct  ivt* 
projection,  which  It'ads  to  a  solution  (mt'tinxi)  that  is  intuitively  simpit'r  than  earlier 
methods  [Kanadt'Sd]  [lluttt'nhxlu'rST]  [CyganskiSS]  [Hutti'nlocht'rtHl]  [( It imsoidCdh].  1  hi* 
approadi  here  is  motivated  gt'omel  rically.  wlit'it'as  earlit'r  mi't  hods  to  pically  are  hast'tl 
on  algebraic  constraints  dt'iivt'd  from  tlx*  rigitlity  tjf  iil)  rotations,  ,\dtlit  ionally.  t  he 
gt'txix'tric  ap|)rt)ach  makt's  it  easit'r  t<»  vit'w  what  happens  lor  s|)ecial  conligiirat it)ns  ol 
tht'  points  (St'ction  L’.')). 

A  rt'vit'w  of  inevious  iix'tlxxls  along  with  a  unifit'd  |>r<*st'nt  at  ion  ol  tlx'ir  stiliititins  is 
givt'ii  in  St'ctions  '.'..S  and  lluttt'nUx  her  and  rilman  [lluttt'nlocher!)0]  [iroved  that  tlx* 
post'  solutitin  t'xists  and  is  uni<iu<'.  which  alst)  is  doix*  Ix'rt*  (St'ctit)n  ’J.l).  I  he  stilution 
Ix'i't'  most  rt'semblt's  I'llman's  [rilmanXI)]  [Hut tt'iiloclx'rST].  in  that  both  t'lxl  up  having 
to  solvt'  tlx*  saint'  bitpiadratic  <'<|uation.  although  each  derivt's  the  bitpiadratit  difft'rt'iit ly. 
I  nlike  rilman's  solution,  this  chapter  rt'stdvt's  which  of  tlx*  two  non-t'tiuivalent  solutions 
to  the  bi(|iiadrat ic  is  corrt'ct.  .Also,  if  e.\|)lains  graphicallv  why  the  solutions  ari.se  and  to 
what  geomet ry  t'ach  corresponds  (Section  2.1). 

In  addition  to  providing  a  geomtric  interprt'tation.  the  solution  in  this  chajitt'r  leads 
to  direct  t'.xpre.ssions  for  tlx*  three  malchetl  mtxlel  points  in  canx'ra-ct'iitt'rt'd  coordinates 
as  wt'll  as  an  expression  for  tlx*  imagt*  position  t)f  any  atlditional.  unmatclx'tl  nmtlt'l  |)oint 
(St'ction  2.7).  In  contrast,  t'arlier  metliods  all  retpiire  the  internx'diate  comimtat itxi  tif 


J.l.  Tilt:  TTHSTTCTIVt:  (WST 


i.i 


a  iiK)(lel-t()-iinao,<>  1 1 aiistoniiat ion.  SixH  itically.  earlier  st)lutiu!is  coinpiile  an  initial  trans- 
loiiuation  that  brings  the  iiuxlel  into  image  eiKiialinates.  ami  then  compute  an  arhlitionai 
I  ranslorinat  ion  to  align  the  matcheci  imxh'l  points  to  t  heir  corresjjomling  image  points. 
This  is  meaningful  because  many  recogiiit ion  systems  (im  hiding  the  alignment  algorithm 
in  Section  l.d)  cah  ulate  the  dl)  post^  solution  many  times  while  searching  ior  the  (orrect 
pose  of  the  model  [l''is<  hlerS  1  ]  [  rinuiipsonN?]  [IliitlenloclierST]  [LinnainmaaSS]  [Hutten- 
locherbl)]  [.lacobshl]  [rilmanhl].  ( 'onsecpuMitly,  avoiding  the  intermediate  caiculat ion  oi 
tin'  transformation  (ould  cause  such  systems  to  run  last<*i. 


2.1  The  Perspective  Case 

Th('r<'  is  an  intrinsic  geoim'try  that  umh'rlir's  th<‘  perspecti\<*  three  point  problem:  it 
is  shown  in  Fig.  'J-l.  Iti  the  figure,  tin*  thr<*«>  mo<lel  points.  tUn.  and  m>.  are  being 
perspect iv('ly  proi('cted  onto  thre('  image  points,  d,.  /|.  and  i  via  lini's  through  tlie  center 
t)f  piaijection  ((('iiti'r  point).  />".  I'iie  task  is  to  r<*cov<'r  nu).  ni\.  and  1  Ik*  essr'nfial 

information  is  contaim'd  in  tin*  side  lengths  and  angles  of  tin*  surrounding  t<'t ralu'dron. 

.\s  jiictuK'd  in  i  ig.  2-1.  I  will  work  in  camera-cent('red  <'oordinat<'s  with  tin*  cenl(*r 
point  at  the  origin  and  tin*  litie  of  sight  along  tin*  r  a.xis.  Looking  at  the  ('ssential 
parameters,  the  distances  /dn-  /du-  <md  l{\2  com<*  from  t lu*  original,  unt  ransformeil  model 
points.  .Also,  tin-  angles  fdu-  fAu-  fhi  <an  lx*  computed  from  the  positions  of  the  image 
points,  the  focal  h’ligth.  and  the  center  imint.  lo  stn*  this,  let  /  ('(|ual  the*  local  length, 
and  h't  tin*  image  points  /|.  /,  be  e.xtemlerl  as  follows:  (.r.//)  (.r.//.  /).  Lhen 

cos  0(]\  —  t ((  •  / 1 .  cos  Oiyt  —  P)  '  Cj.  cos  flj  J  M  ^ 2 •  (2.1) 

where  in  general  r  denotr's  the  unit  vector  in  tin*  direction  of  r.  1  Ik*  jiroblem  is  to 
determiiK' (/.  h.  and  c  given  /du-  Hoi-  Ili2-  <osfd)2-  a'ld  cosfdi-  f  rom  the  picture*, 

we*  set*  l)V  the*  law  of  cosines  that 


e/‘  +  IT 

—  '2(1 1)  cos  Dot 

=  ntn 

(2.2) 

(T  +  e-"’ 

—  '2a  r  cos  fd)2 

=  1^02 

(2.;?) 

IT  -f  f  - 

—  'Ihccos  f)y2 

=  III, 

(2.1) 

Over  tiriH*.  tlu'ie  have  been  many  solutions  to  the  problem,  all  of  which  start  with  the 
above*  e*e|uations.  The  solutieins  differ  in  how  they  manipulate*  the  e*e|uations  when  se)lving 
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Figure  2-1:  Model  points  n7^).  n7,.  and  n72  undergoing  jjerspective  projection  to 
produce  iinag'’  points  Iq.  ij.  and  ij.  a.  b.  and  c  are  distances  from  the  center  point. 
p.  to  the  model  points. 


■2.2.  SUMMARY  OF  :{I)  FOSE  AM)  DIRECT  ALICSMEST 


for  tlu'  unknowns.  Recently.  Haralick  et  al.  reviewed  llie  various  solutions  and  examined 
tlieir  stabilities  [Haralick91]. 

(liven  (I.  b.  and  c.  we  easily  can  compute  the  dl)  locations  of  the  model  jroints: 

iTiq  =  (I  i  a  in  I  ~  hii  m_>  =  c/.>.  ("d.')) 

If  a  .31)  rigid  transformation  is  desired,  it  can  l)e  determined  from  the  original  .31)  model 
points  and  the  31)  camera-centered  model  points  just  comjjuted.  .A  simple  method  for 
doing  so  is  given  in  .Appendix  for  a  lea.st-.scpiares  solution,  see  [Horn86]. 


2.2  Summary  of  3D  Pose  and  Direct  Alignment 


Similar  to  the  perspective  case,  there  is  an  intrinsic  geometry  underlying  the  weak- 
perspective  three-|)oint  [)roblem.  shown  in  Fig.  2-2.  The  picture  shows  the  three  model 
points  being  i)rojected  orthographically  onto  the  plane  that  contains  mo  and  is  i)araliel 
to  the  image  plane,  and  then  shows  them  being  scaled  down  into  the  image.  In  addition, 
the  picture  shows  the  model  points  first  being  scaled  down  and  then  projected  onto  the 
image  plane.  In  each  case,  the  projection  is  rei)resented  by  a  solid  with  right  angles  as 
shown.  The  smaller  solid  is  a  scaled-down  version  of  the  larger.  The  relevant  information 
consists  of  the  side  lengths  of  the  solids  and  the  scale  factor. 

For  reference,  this  section  summarizes  how  to  compute  the  locations  of  the  three 
matched  model  points  and  the  image  location  of  any  additional,  unmatched  model  point. 
The  expressions  will  be  discussed  in  Section  2.3  and  derived  in  Secs.  2.-1  and  2.7.  Let  the 
distances  between  the  model  points  be  {R^-  Roz-  ‘Hid  the  corresponding  distances 

between  the  image  points  be  ((lm.(lo2-<I\2)-  -Also  let 


a  =  (/?01  +  /?02  +  Ri2){-~Ro\  +  Ro2  +  R\2)(Rm  —  Ro2  +  R\2){Ro\  +  Ro2  —  Riz) 

b  =  +  K,  +  /};,)  +  -  "t) 

=  +  ^iz)  +  Rlzi^^Oi  ~  <^02  +  f^iz)  +  Ryzi^^m  +  f^oz  ~~  ^^vz) 

c  =  (f/oi  -f  (Iqi  +  <hz){~do\  +  dnz  +  (hz)(di>\  ~  (krz  +  (kz){(Ioi  +  <Ioz  ~  <hz) 

1  —  1  ot  herwise. 


I’hen  if  a  ^  0  (otherwise  see  Section  2..b.3),  the  unknown  parameters  of  the  geometry  in 
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S  (scale) 


Figure  2-2:  Mode!  points  uTo-  /?7[.  and  undergoing  orlliograj>)iic  projertion  plus  scale 
to  produce  image  points  /(,.  and  /j- 


(In  practice.  |/F  —  f/c|  should  he  used  for  the  inner  radicand  in  Equation  2.G.  because 
numerical  roundoff  error  can  cause  it  to  become  negative.) 
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(I'iveii  image  points  /q  =  (Jo-iyo).  /i  =  (  'i-.Vi)-  and  >2  =  (‘'■2- '/i)-  solution  can 

be  used  to  compute  tlie  3D  locations  of  the  juode!  points  in  camera-centered  coordinates: 

u7o  =  -(.Cu./A).  «')  iFi  =  -(.ri.//i,A,  -f  //•)  nly  =  -{.r2.y2-fi2  +  ir).  (2.9) 

.S  S  ,S 

where  ir  is  an  unknown  offset  in  a  direction  normal  to  the  image  plane.  It  is  worth  noting 
tha*  if  the  3D  rigid  transform  that  brings  the  model  into  camera-centered  coordinates 
is  desired,  it  can  be  computed  from  these  three  camera-centered  model  points  and  the 
original  three  model  points.  The  unknown  offset  ic  drops  out  when  computing  the  rota¬ 
tion  and  remains  only  in  the  c  coordinate  of  the  translation,  which  cannot  be  recovered. 
.\s  mentioned  in  Section  2.1,  a  simple  method  for  computing  the  transform  is  given  in 
.■\])pendix  .A.,  and  a  least-squares  solution  is  given  in  [Horn86]. 

Next.  I  give  an  expression  for  the  image  location  of  a  fourth  model  point.  Originally, 
the  models  points  are  in  some  arbitrary  model  coordinate  frame.  Also,  the  image  points 
are  in  a  camera-centered  coordinate  frame  in  which  the  image  serves  as  the  x-y  plane. 
De/iote  the  origii}al.  untransfornied  model  points  by  p,.  to  distinguish  them  from  the 
camera-centered  model  points  ;n,  shown  in  Fig.  2-2.  Using  po-  Pi-  and  P2-  solve  the 
following  vector  eriuation  for  the  "extended  affine  coordinates."  (o.-Uy).  of  py. 

Pi  =  «(/9  -  Po)  +  -Hlh  -  po)  +  lipi  -  po)  X  (p2  -  Po)  +  Po  C^-IO) 

Let  .Coi  =  .I’l  -  .ro.  yoi  =  y\  -  Po-  J02  =  •f2  -  -lo-  and  .t/02  =  ,'/2  -  Po-  Then  the  image 
location  of  the  transformed  and  projectetl  p^  is 

(o.roi  +  T.ro2  +  lipoif^z  ~  PoiHi )  +  J’o-  f^Poi  +  -^Poz  +  li—-i'oiF{2  +  -I'ozffi )  +  Po)-  (--1 1 ) 


2.3  Discussion  of  3D  Pose 

Section  2.4  will  show  the  following  results,  in  addition  to  deriving  the  3D  pose  solution 
given  in  the  last  section.  The  pose  solution  has  a  two-way  ambiguity  unless  h\  and  /?2 
are  zero  (Equation  2.7).  The  ambiguity  corresponds  to  a  reflection  about  a  plane  parallel 
to  the  image  plane.  When  h\  =  /r2  =  0.  the  model  triangle  (the  triangle  defined  by  the 
three  model  points)  is  parallel  to  the  image  triangle  (the  triangle  defined  by  the  three 
image  points).  As  a  note,  a  and  c  measure  sixteen  times  the  squares  of  the  areas  of 
the  model  and  image  triangles,  respectively.  Further,  the  solution  fails  when  the  model 
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triangle  degenerates  to  a  line,  in  v'hieh  case  u  =  0;  in  fact,  this  is  the  only  instance  in 
which  a  solution  may  not  exist  (Section  2.5.;}).  Note  that  no  such  restriction  is  placed  on 
the  image  triangle;  .so  the  image  points  may  he  collinear.  Note  also  that  no  restriction 
is  placed  on  the  shape  of  the  triangles,  although  the  triangles  in  Fig.  2-2  are  acute.  For 
illustration.  Fig.  2-3  right  shows  a  picture  for  when  the  model  triangle  is  acute  and  tin' 
image  triangle  is  not.  along  with  the  smaller  .solid  from  Fig.  2-2. 

Next,  notice  that  all  that  is  iiertinent  to  recovering  the  3D  i>ose  of  the  model  are  the 
distances  between  the  model  and  image  points,  not  their  locations.  Previous  solutions 
have  used  the  actual  locations  of  the  points  to  com])ute  the  pose,  after  first  applying  a 
rigid  tramsformation  to  put  the  three  model  points  in  the  image  plane  [HnttenlocherST] 
[Huttenlocher90]  [()rimson92a]. 

In  terms  of  the  ordering  of  the  points,  the  symmetry  in  Ecpiations  2. 6-2. 6  shows  that 
the  scale  factor  is  the  same,  independent  of  the  ordering.  Previous  methods  that  are  based 
on  the  coordinates  of  tljc  points  after  some  initial  transformations  make  this  symmetry 
unclear.  For  the  altituc';-  Hi  and  Hz  (or  hi  and  hz)<  we  can  see  from  Fig.  2-2  how  the 
different  orderings  are  related;  In  Fig.  2-2  the  solution  is  based  at  /Uo-  and  the  altitudes 
are  Hi  for  tTii.  Hz  for  Wz.  and  0  for  mo-  For  a  solution  based  at  ivi.  the  altitudes  become 
0  for  ?T?i.  Hz  —  Hi  for  Wz.  and  —Hi  for  tuq-  For  a  solution  based  at  mz-  the  altitudes 
become  Hi  —  Hz  for  mj.  0  for  7i)z,  and  —Hz  for  7»o- 


2.4  Existence  and  Uniqueness 

This  section  derives  the  3D  pose  solution  and  shows  that  the  solution  exists  for  all  sets  of 
mode]  and  image  points  except  when  the  model  points  are  collinear.  and  that  the  solution 
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is  always  uniciiu'.  hi  deriving  the  ill)  pose  solution.  1  start  witli  tlie  basic  geometry  for  tlie 
\v('ak-i)ers|H'(  t ive  t hr«"t'-|H)int  i)roblein.  shown  in  l-'ig.  J-d.  I  luMe  an'  tliree  riglit  triangle's 
in  ('ach  solid.  Iroin  which  three'  eenistraints  ean  be'  ge'iu'rate'd: 

/if  +  e/,-„  =  (.s//^„)^  (2.r_>) 

//^  +  dyj  ==  (  (  -•  1  d  ) 

(/m -//>)- +  dl,  =  (2.11) 


Idle'  di.stance's  Aoi.  //oj.  Hu.  e/yj.  Jqj-  *li2  ^nd  the  scale'  factor  .s  are  all  peisit ive'.  but  the 
alt  it  tides  li  | .  li,  along  with  II  \ .  //_>  are'  signe'el.  Since'  li  \  anel  li>  are'  signe'el.  having  "h  i  —  li  2' 
in  the  third  eeiuation  is  an  arbitrary  chehce' ove'r  "/i|  +  I12  ':  it  was  chose'ii  be'canse'.  when 
III  and  h,  are  positive,  it  directly  eorr('s|)onels  te)  the*  picture's  in  Fig.  2-.F 

Multiplying  the  third  e'eination  by  —1  and  aelding  all  thre'e'  give's 

2/i|/i2  =  ~  ~  +  ^^02  —  ^^12)-  (-  h*)) 


Seiuaring  and  using  the  first  two  e'<iualions  again  te)  eliminate'  /f[  and  /12.  we  have' 

-  y„  n-'K,  -  <0  =  +  '4  -  iiU)  -  {4,  +  '4  -  <0)'  ■  is-io 

whicli.  after  some'  maniiiulation.  leads  to  a  bie|uadratic  in  .s  (for  details  se-e  .Appe'iidix  H.  1 ): 


ef.s’ -  2/).s^  +  r  =  0.  (2.17) 

where 

ee  =  l/Ai,/Ar2-(/^oi  +/4-/^12)^ 

=  +  l^Q2  +  ^^12)(~/hll  +  II02  +  ^12)(^^01  “  +  /^I2)(^^U1  +  I^U2  ~  ^^2) 

b  =  i/^oie/y.^  +  i/dy^e/oi  —  (/fyi  "h  ^2  ~  42)(4i  +  e/y^  — 

=  4l(-/?dl  +  +  1^2)  +  42(/^d,  -  +  ^^2)  +  ^/f2(4,  +  ^2  ~  42) 

=  4n(~4i  +  42  +  ^^12)  +  42(4i  -  42  +  ^u)  +  42(4i  +  42  ~  42) 

c  =  4f/d,e4  -  (4i +42-42)' 

=  (doi  +  e/02  +  di2)(— e/01  +  e/02  +  e/|2)(e/oi  —  e/02  +  e/i2)(e/oi  +  e/02  —  e/12) 

In  Fig.  2-2.  let  p  denote  tlie  angle  between  mi  —  t7?o  and  m2  —  mo-  aiul  let  C'  be  the  angle 
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between  ii  —  i^)  and  —  i^-  Notice  l)y  the  law  of  cosines  that 

(I  =  /fy2  ~  ( “^01  ^02  <‘OS  o)^  =  4( /foi /fo2  S’'!!  o)‘  (2.18) 

h  =  2/?oido2  +  “^02^/01  -  (-^^01  /?02fos  o)(2(/oido2  t’) 

~  2(  /?0idy2  +  Ho2^foi  —  2Ri)\  Ro2<lo\<lo2  <^OS  O  COS  I.')  (2.1b) 

e  =  4dy,du2  “  (-<^oido2<'o.s  r)^  =  4(doid(j2  sin  (2.20) 

Further,  ^/?oi /?o2  sin  o  e(|uals  the  area  of  the  model  triangle,  so  that  a  measures  sixteen 
times  the  square  oi  the  area  ot  the  model  triangle.  .Analogously,  c  measures  sixteen  times 
the  s(|uar<'  of  tlie  area  of  the  image  triangle. 

The  biquadratic  in  Equation  2. 17  is  equivalent  to  the  one  originally  derived  by  I'llman. 
But  ('liman  made  no  attempt  to  interpret  or  decide  among  its  solutions,  which  will  be 
done  here.  We  are  interested  oidy  in  positive,  real  .solutions  for  s.  the  scale  factor.  In 
general,  the  positive  solutions  of  the  biquadratic  are  given  by 

(2.21) 

Dei)ending  on  the  radicands.  there  will  be  zero.  one.  or  two  real  solutions.  Particularly, 
we  are  interested  in  whether  each  number  of  solutions  can  arise,  and.  if  so.  to  what  the 
solutions  correspond  geometrically. 

In  what  lollows.  1  a.ssume  that  the  model  triangle  is  not  degenerate,  that  is.  not  simply 
a  line  or  a  point.  This  situation  is  the  only  time  the  solution  is  not  guaranteed  to  exist 
(.see  Section  2. -5. 3).  .Note  that  this  a.s.sumption  implies  that  u  0  and  O  ^  0.  tt. 

To  begin,  let  us  determine  the  signs  of  a.  b.  and  c.  From  Equations  2.18  and  2.20. 
clearly  a  >  0  and  c  >  0.  From  Equation  2.19.  it  is  straightforward  to  see  that  h  >  0. 
since 

b  —  2(/?o,r/o.2  +  ^02^^01  ~  2/?oi  <'os  Ocos  v) 

ilm)s  since  COSO  <  1.  cost'  <  1 

=  2( /?oido2  — /?o2doi  ^  0 

Via  some  algebra  (given  in  Appendix  B.r)).  it  can  be  shown  that 


-  ac  =  4(/?oi<^02)‘'  -  2co.s(o+  v)t  +  l)  -  2co.s(o  -  ir)i  +  l)  . 


(2.22) 


2.  ).  EXISTESC'K  A.\D  (  SlQl  EXKSS 


vvlieiv'  /  =  ■  which  leads  to  IE  —  or  >  0  (s<*e  Appendix  B.  J).  From  this  fact  and  that 

(I  >  0,  I)  >  0.  and  c  >  0.  we  can  deri\e  that  th<‘r<‘  are  in  general  two  solutions  for  s  with 
a  single  s[)ecial  case  when  lE  —  ac  =  0.  which  can  be  seen  as  iollows; 

lE  —  (u-  >  0  =>  I)  ±  \/lE  —  ac  >  0.  since  l>  >  0  aiul  ac  >  0 

1)  ±  \/h^  —  ac  . 

-  >  0,  since  (I  >  0 


Hence  .s  =  — .  which  gives  one  or  two  solutions  for  the  bicpiadratic.  depending 

on  whether  IE  —  ac  is  e(|ual  to  zero  or  is  positive. 

Next  1  show  that  of  the  two  solutions  for  the  scale,  exactly  one  of  them  is  valid,  that 
is.  corresponds  to  an  orthographic  projection  of  the  model  points  onto  the  image'  points. 
Furthermore,  the  other  solution  arises  from  inverting  the  model  and  image  distances  in 
Fig  2-2.  In  addition,  there  being  one  solution  for  .scale  corresponds  to  the  s|>ecial  case  in 
which  the  model  triangle  is  parallel  to  the  image  plane.  The  following  proposition  will 
be  used  to  establish  these  claims. 


Proposition  1: 


1 1)  —  \/(E  —  ac 


lb+^/lE 


<  .s  {2:2\) 

«OI  ^02 

Proof;  .Si  and  .S2  are  solutions  to  the  biciuadratic  in  Ecjuation  2.17.  .Since  a  >  0. 
the  cpiadratic  function  in  .s^  on  the  left-hand  side  of  Equation  2.17  is  concave  up  and. 
consequently,  is  negative  exactly  in  the  interval  between  the  zeroes  s,  and  .s^.  Further. 

by  substitution  it  can  be  .seen  that  this  function  takes  on  negative  values  for  .s^  = 

and  .s^  =  ('^)  .Appendix  B.2).  Since  the  scale  factors  and  the  distances  are  non- 

negative.  this  immediately  gives  that  and  lie  between  .S]  and  .S2. 
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2.4.1  The  true  solution  for  scale 


Hert'  it  is  shown  that  exactly  one  of  tlic  two  solutions  for  scale  can  satisfy  the  g(‘oiiietiy 
shown  in  Fig.  2-2.  and  it  is  always  the  same  one.  If  tlu'  two  solutions  are  the  same,  then 
both  solutions  can  satisfy  tin-  geometry  (this  <‘as<*  is  discussed  in  .Section  2.0.]  ).  .\s  will 
be  set'll,  tilt'  valid  solution  is 


I  /)  +  \/l>^  —  or 


(2. 22) 


.Note  that  proving  this  statt'iiient  establishes  the  existence  anti  unit)ueiiess  of  the  jiose 
solution. 


In  Fig.  2-:F  (.s/foi)'^  —  dui  =  Ej  >  0  and  {■'^Roi)^  —  (lu2  ~  ^'2  -  "Idcli  iinjilit's  that 
any  solution  .s  satisfies  <  .s  and  <  .s.  (’onsetpiently.  Projrosition  1  inijilit's  that  .sj 

is  the  only  possible  solution. 

File  ciuestion  remains  wiieliier  .>2  is  itself  a  solution.  The  fact  that  it  satisfies  the 
bitpiadratic  is  not  sufficient  since  the  s(|uaring  used  to  obtain  Equation  2.1()  from  Etpia- 
tion  2.1')  may  not  be  reversible.  Vet  we  do  know  .S2  satisfies  Eciuation  2. Hi.  because  the 
steps  from  Equation  2.10  to  E(iuation  2.21  are  reversible.  ( 'onseciiient ly.  Eriiiation  2.15 
will  be  satisfied  if  the  sign  of  /cj  relative  to  h\  is  chosen  accordingly.  Let  rr  be  tin'  sign  of 
1)2  when  the  sign  of  /q  is  1.  and  —rr  be  /)2's  sign  when  /q's  sign  is  —1.  Then  unless  the 
right-hand  side  of  this  equation  is  0.  E(|uation  2.15  is  satisfied  by 


rr  =; 


if  ili,  +  dll 

if  <1  +  4i 


(til  <  -■‘"'f  ffoi  +  Ki 

(lit  >  ARi, 


HU 


+  Kt  -  ')■ 


12  I 


If  on  the  other  hand  .s^(/ff2  ~  ^01  ~  ^02)  ~  Eriuation  2.15  implies 

h\  or  1)2  is  0.  so  that  the  sign  of  1)2  relative  to  hi  is  arbitrary. 


Notice  that  the  collective  sign  of  b]  and  1)2  is  still  free,  and  so  there  is  a  two-way 
ambiguity  in  the  jiairs  (hi. 1)2)  and  (//],  H2).  As  can  be  seen  in  Fig.  2-2.  the  ambiguity 
geometrically  corresponds  to  a  flip  of  the  plane  containing  the  space  points  tuo.  W].  and 
)V2.  The  flip  is  about  a  plane  in  space  that  is  parallel  to  the  image  plane,  but  which 
plane  it  is  cannot  be  determined  since  the  problem  gives  no  information  about  offsets  of 
the  model  in  the  r  direction.  Due  to  the  reflection,  for  planar  objects  the  two  solutions 
are  equivalent,  in  that  they  give  the  same  image  jioiiits  when  projected.  On  the  itlier 
hand,  for  non-{)lanar  objects  the  two  solutions  project  to  two  different  sets  of  points. 
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riu'n'  is  a  spec  ial  case',  as  ineutionecl  above',  wlic'ii  the  sign  of  li  >  is  arl)itrary  rc'lativc' 
to  the  sign  of  //].  In  this  case,  tlie  right-hand  side  of  Ecpiation  2.1j  is  zero,  and  tliis 
iinj)lies  that  h\  or  I12  isi  zero  also.  Looking  at  fig.  2-1.  geometrically  what  is  ocenrring 
is  that  one  of  the  side's  of  the  model  triangle  tiiat  emanatc's  from  niu  lies  [jarallel  t<<  the 
image  plane,  so  that  the  reflc'ctive  ambiguity  is  obtained  by  freelv  changing  the  sign  of 
the  non-zero  altitude. 


2.4.2  The  inverted  solution  for  scale 


Of  the  two  solutions  for  scale  that  satisfy  the  bicpiadratic.  we  know  that  sg  corresponds 
to  the  geometry  in  Fig.  2-2.  but  what  about  >1?  Fsing  a  similar  argument  to  that  usc'd 
to  prove  S.J  is  a  solution  for  the  weak-perspective  gc'ometry.  we  can  infer  a  geometric 
interpretation  for  s,.  Consider,  then,  a  =  .s,.  The  interpretation  1  will  derive  satisfic's 
the  ecpiations. 

=  (;y4„)'  (2.20) 

+  (■2.27) 

(W, +  =  (nlu)^  (2.28) 


where  ;■  =  i.  Observe  that  and  .S2  have  similar  forms  (compare*  to  Ecpiation  2.25): 


r 


h  —  \//F  — 


ac 


I)  -f-  \/b^  ~  (1C 


(2.29) 


To  begin  the  derivaticni.  Proposition  1  gives  that  c/q,  — (.s/t’oi)'^  ^  0  and  c/,^,,  —  (.s/fo^)^  > 
0.  which  implies  we  can  set  hj  =  c/^,  —  (a/?oi)^  atid  —  ^^02  ~  (•'*^02)^-  Dividing  through 
by  gives  Ecpiations  2.26  and  2.27.  Since  a,  satisfies  Ecjiiation  2.10  (for  the  same  reason 
.sg  did),  we  can  .substitute  into  Equation  2.10  with  h\  and  /i|  to  obtain  (/)]  —  = 

(l\2  —  where  the  sign  o{  relative  to  h\  is  1  if  c/j^j  +  c/q.2  —  d\2  >  •'^^(4?oi  +  ^02  ~  ^12)- 

and  —1  otherwise.  Dividing  through  by  a^  gives  Equation  2.28.  and  so  the  derivation  is 
completed. 

Ceometrically.  Equation  2.20  forms  a  right  triangle  with  sides  II y  and  Hqi.  and  hy¬ 
potenuse  rc/01.  Analogously.  Equations  2.27  and  2.28  imply  right  triangles  as  wcdl.  The 
interpretation  is  displayed  in  Fig.  2-4.  .Another  wav  to  see  what  is  occurring  geometrically 
is  to  note  that  the  roles  of  the  model  and  image  distances  from  Equations  2.12-2.14  are 
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P'igiire  2-n:  (^oonirtrically  interpreting  the  inverted  solution  for  scale 


inverted  in  E(|uations  2.26-2.28.  In  effect,  what  is  happening  is  that  instead  of  scaling 
down  the  model  triangle  and  projecting  it  orthographically  onto  the  image  triangle,  the 
image  triangle  is  being  scaled  up  and  projected  orthographically  onto  the  model  triangle, 
where  orthographically  means  projected  along  rays  that  are  |)erpendicular  to  the  model 
triangle.  This  means  we  can  rotate  the  solid  in  Fig.  2-4  so  that  the  thrc'e  model  points 
are  in  the  image  plane,  and.  as  shown  in  Fig.  2-5,  obtain  for  the  inverted  solution  a 
weak-perspective  geometry  that  is  analogous  to  the  true  geometry. 
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2.5  Special  Configurations  of  the  Points 

2.5.1  Model  triangle  is  parallel  to  the  image  plane 

1  li(‘  two  solutions  it)!'  t!i('  s(al<'  lactor  aro  tlu*  same  wlirn  IE  —  (ic  =  0.  and  linrr  I  d(‘nK)n 
st  i  at('  I  hat  m'onict  rically  t  his  corresponds  to  I  lie  plaiK"  < out ainini;  t  he  1  hree  model  point  s 
Ix'iiiu,  parallel  to  lh«'  imaj>,('  plain',  }ietor<'  provin>>  this,  let  ns  estahlish  the  ('xislt'iice  ot 
tin'  solution  lor  scab'  in  this  sp<'cial  cas<'  ol  IE  —  m  —  0.  ln)okint>.  at  iMpiation  "J.-M. 


lE  —  iic  =  0 


is  a  s(jhition  to  tin'  hiipiadrat ic  since  ti  >  0  and  />  >  0. 

I  sing  I'.cpiation  J.'J'J.  it  can  he  shown  tliat  IE  —  nr  =  t)  «'.\actly  wln'ii  o  =  ±(.'  oi 
o  =  ±(,’  +  TT  aiul  ^  (s<'«'  .Appendi.x  H.d).  I'Vom  this  ri'snit  and  I'iijiiat  ions  2. IN 

and  2.20. 


|«/„i«/oj  sill  o| 

I /All  /Aii 


li  \  —  \/(>/Aii  )*  —  I'/iii  —  0. 


( •'^/Ai.'  )■  “  —  0. 


rims  IE  —  (IC  =  0  only  il  tin'  iiunh'l  triangle  is  |)arall('l  to  the  image'  [ilain'.  ( 'onx'erse'ly.  if 
the  model  triaiigh'  is  paralh'l  to  tin'  image'  |)lane'.  it  must  he'  that  o  =  I'lirtln'r.  in  this 
ease'  Aj  =  lij  =  0.  so  that  .s  =  wliieli  im|)lie's  that  IE  —  nr  =  0. 

Since  the' t  we)  .seiliit ienis  are'  the' .same,  we-  know  that  s,  =  .s,  =  i.  Neitiee'  in  l  ig.  2-d  le'lt 
ainl  Fig.  2- 1  that  the'  ge'ome'tric  inte'rpr<'tatie)ns  fe)r  the'  twe)  se)lntie)ns  te)r  seah'  eeillapse'  tei 
the'  same'  solution  when  liy  =  /ij  =  //i  =  H>  =  t)  ainl  .s  _  d  re'suli.  »Fin-n  the're'  is 

one'  seilntie”  fe)r  seale,  there'  is  also  one  solution  lor  (Ai./ij)  ainl  (H\.  //j).  alhe'it  (0.0). 


2.5.2  Model  triangle  is  perpendicular  to  the  image  plane 

riu'  situatie)!!  where'  t  he  model  triangle  is  [)erpendicular  te)  the'  image'  plane'  is  e)/  inte'ie'st 
since  the'  pre)je'et ion  is  a  line'.  Note',  however,  that  the  se)lutie)n  give-n  e'arlie'r  make's  ini 
exceptie)!!  lor  this  case'  as  long  as  the  nioele'l  triangle'  is  not  ele'ge'iie'rate'.  ,\s  ie)r  what 
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Figun'  2-():  S[)C(ial  case  wlierr  model  triangle  is  a  liii(‘.  The  rejx'ated  labels  correspoiK 
to  two  clifFeretit  solutions  for  the  position  of  the  nuxh'l  that  ]ea\e  >0/1  projecting  onto  i, 
For  both  solutions  >//?_>  projects  onto  the  same  image  point. 


hapix'iis  in  this  case,  since  the  image  triangle  is  a  lin<'.  we  know  r  =  0  or  c  —  -=>  c  = 
0  Fxiuation  2.21  becomes 


(2.;52) 


From  Section  2.1.  of  the  two  solutions  for  scale,  the  true  one  is 
is  0. 


and  t  he  inverted  one 


lo  see  why  the  inverted  solution  is  zero,  recall  that  the  solution  can  be  viewed  as 
scaling  and  projecting  the  image  triangle  onto  the  modcd  triangle,  using  for  scale  c  = 
which  in  this  case  does  not  e.xist.  Since  the  image  triangle  is  a  line.  gra|)hically  this 
amounts  to  trying  to  scale  a  line  so  that  it  can  project  as  a  triangle,  which  is  not  ])ossible. 


2.5.3  Model  triangle  is  a  line 

This  is  the  one  case  where  the  solution  for  the  scale  fails,  and  it  fails  because  a.  which  i^ 
a  measure  of  the  area  of  the  model  triangle,  is  zero.  Despite  this  fact,  we  can  determine 
when  a  solution  exists.  First,  we  know  that  the  image  triangle  must  be  a  line  as  well,  lb 


Sl  Wlill.ilY 
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sec  if  ihis  coiidit ion  <’iioui'li.  roiisidor  lookiiiti  loi  ,i  dl)  rotiitioii  and  sc  al<‘  that  h-axc-' 
s//^  I  cell  lio,t’iaj)lii(all\'  |)fojc‘(t  iiiL^  onto  /)  as  in  liu.  J-fi,  OhsoiA  c  t  hat  iaoix  sue  li  lotation 
and  scale-  h-axes  >/</_,  project  int;  onto  t  he  same  point  in  t  he  innme.  1  Ids  means  t  hat  tor  <i 
solution  to  exist,  it  must  he  that  ^  Kvc-n  when  the  imai>,e  triatitile  is  a  line,  this 

in  i^enc-ral  is  not  trite-.  W  h<-n  it  is  tun-.  lln-r«-  is  an  iidinitx  ol  solnlioiis  c orri  spondim;  to 
(-\(-r\’  scal(-d  rotation  that  l<-a\'es  >///|  projc-ct mtr  onto  /). 

.\not  In-r  way  to  look  at  t  his  sit  nat  ion  is  to  not  i< »-  t  hat  t  In-  modc-l  t  rianu,lc-  Ix-inn  a  liin- 
wln-n  nsinu,  tin-  Inn-  solution  is  analt»<>ons  to  the  imaji,<-  triamrh-  l«'in<>  a  line-  uhen  nsin^ 
the-  iti\e-ite-d  solution,  l-rom  Se-ction  llie-  seale-  laetoi  lot  the-  inxe-rte-d  solution  eloe-s 

not  e-xist  iinle-ss  a  =  0.  which  su|)porls  that  in  this  ease-  the-  scale-  taeloi  cloe-s  not  e-xist 
tmle-ss  e  =  0. 


2.6  Stability 


This  se'ctierii  poitits  out  two  situations  to  he-  eare-lul  e>t  wlie-n  usinu,  the-  peese-  solution. 
1  he'  first  is  wln-n  the-  niode-l  points  are-  ne-arly  eolline-ar.  he-eattse'  the-  se>hition  is  ne-ai  a 
singularity  { Se-ction  'J.').;}), 

rite-  se'cond  situation  is  whe-n  the-  moele-l  triangle-  is  |)atalle'l  ti>  the-  image-  plane-.  Siiiee- 
the-  pose-  solution  is  nnieiue-  for  any  pair  of  n<e)eie-i  and  image-  triangle-s.  for  e-ach  ol  the- 
thre-e  image-  |)oints  tln-re  is  always  some-  dire-ction  in  whieh  it  can  me)ve'  sneh  that  its 
corre-sponding  mode-l  point  niide-rgoe-s  dl)  re)lation  witht>nt  se  ale-  (se-e-  l  ig.  ‘i- 1  le-ft).  In 
ge-iie-ral.  whe-n  a  moele-l  point  is  re)tating  in  spaee-  are)tmel  a  line-  in  the-  image-  i)lane'.  a 
move-me-nt  hy  its  e-orre-spondiiig  image-  |)e)int  hy  an  amount  A.r  eanse-s  the-  altitude- of  the- 
mode-1  point  to  e  hange-  hy  an  aimeunt  A/i  (se-e-  Kig.  2-7  right  ).  ae-coreling  to 

+  /,"  =  (.r-  A.r)'  +  (//  +  A// )-  (2.;{;{) 

I'hxpanding  give-s  A/f'  +  2//A/t  —  2.rA.r  -f  A.r'^  =  0.  which  we-  can  solve-  for  A//: 

A/t  =  -/i  ±  +  2.rA.r  -  A.r^  (2.21) 

Whe-n  A.r  is  small  and  li  =  0.  we-  have-  A/e  =  ±\/2.rA.r  .  'riitis.  d('])e-nding  on  \/27.  a 
small  change-  in  ,r  could  le-ad  to  a  large-  cliange-  in  It.  which  may  cause-  instahilit>-.  .Xeele- 
tliat  this  instability  is  inlie-re-nt  in  the-  prohU-m.  anel  .see  eare-  shoulel  he-  take-n  whe-n  using 
any  se)lutie)n  fe)r  21)  perse-. 
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Figure  2-7:  Unstable  situation.  Left:  Looking  straight  down  at  the  image.  If  /b  moves 
along  the  perpendicular  to  the  line  bf'tween  /o  'ind  /  i  while  the  other  points  remain  h.xed. 
then  ^v)2  will  rotate  in  space  around  that  line  to  follow  Right:  Looking  from  the  side. 
One  of  the  model  points  is  rotating  in  space  around  a  line  in  the  image  plane. 


2.7  Derivation  of  Direct  Alignment 

To  compute  the  position  in  the  image  of  a  fourth  model  point.  I  first  use  the  weak- 
pers[)ective  pose  solution  to  compute  its  3D  position  in  camera-centered  coordinates.  I 
then  project  the  camera-centered  model  point  under  weak-perspective  and  obtain  the 
image  position  without  having  to  calculate  a  rnodel-to-image  tran.sformation.  Let  the 
image  points  be  /j,  =  {xq.i/q).  ii  =  (.ci.^i).  an<l  iz  =  i-Vz-fJz)-  Given  .s.  hi.  we  can 
invert  the  projection  to  get  the  tiiree  model  points: 

1  ^  I  ,  ^  1  , 

zuo  =  - ( J’o.  i/o-  »’ )  mi  =  -(xi.yi.hi  A  ir )  W2=  -{ .r 2 .  ,i/2 •  « 2  +  «’ ) •  ( ‘•^•35 ) 

S  .H- 

where  w  is  an  unknown  offset  in  a  direction  normal  to  the  image  plane. 

Given  three  noncollinear  2D  points,  (fi.  and  <72.  a  fourth  2D  point  ^  can  be 
unicjuely  represented  by  its  affine  coordinates  [Graustein30].  (a.  3).  which  are  given  by 
the  ecpiation  =  o{<fi  —  (fo)  +  3{q2  —  (fo)  +  (jo-  Given  three  noncollinear  3D  points,  po. 
jTi.  and  p2-  we  can  uniciuely  represent  any  other  3D  point  />3  in  terms  of  what  1  shall  call 
its  "e.xtended  affine  coordinates."  (o./U")  ): 


Pi  =  MPi  -  Po)  +  iHlh  ~  Po)  +  l{P\  -  /jo)  X  (th  -  Po)  +  />o  (2.36) 


Let  (o.  .3.7)  be  the  extended  affine  coordinates  of  the  fourth  model  point  in  terms  of 
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the  matched  three,  which  I  assume  are  noncolliiiear.  Let  .roi  =  J  j  —  Jo-  .Voi  =  */i  —  {Jo- 
■i'02  —  ~  'o.  and  tjoi  =  Hi  —  ,t/o-  Then,  using  the  three  camera-centered  model  points 

such  that  =  /Ho,  p\  =  nTj.  and  ~  m2. 


Pi  -  Po  = 

“  ( -'oi .  yol  -III) 

.s 

(2.37) 

P2  -  po  = 

~{-i'02-  3/02-  l> 2) 

.S 

(2.38) 

(39  -  p7))  X  (p^2  -  Po)  = 

“7(.</Ol/*2  “  yozh  l-  -I'Ull)  \  “  •*’01 /72.  •*'01.</02  ~  •7’02.</01  ) 

.  (2.39) 

Next,  substitute  Equations  2.37-2.39  into  Equation  2.36  to  get  the  3D  location  of  the 
fourth  point; 


”^3  —  “O(-r01- </01- /'  l  )  H - .*/02i 

.s  s 

+7  “7(3/01 /*2  ~  “J’oif>2  + -r 02 -^’o  13/02  “  J’o23/oi  )  H — (-^’o*  3/o- *<' 

.s 

_  .  j  ,  yo\i>2  —  yo2l>i  , 

—  -(o.Coi  +  f2J'02  +  7 - 1-  .Vq. 

S  S 

— -I’oi/ji  +  •l’02/'l  , 

(^yoi  +  b.V02  +  7 - h  .3/0- 


/  1  .21  1  '^onVoi  —  ■*■023/01  ,  . 

a  III  -t-  nil 2  -h  7 - 1-  w) 


(2.40) 


Let  n  represent  an  orthogonal  projection  along  the  c  axis.  To  project,  multiply  by  the 
scale  factor  s  and  drop  the  c  coordinate: 


0(^7773)  —  (osToi  +  /7j'02  +  '){yo\R2  “  yOzHl  )  +  •i’O" 

03/01  +  l^yoz  +  i(—J'o\R2  +  ■I'ozRi )  +  3/0)  (2.41 ) 

Notice  that  the  unknown  offset  ir  has  dropped  out.  This  expression  computes  the  image 
position  of  pa  from  its  extended  affine  coordinates,  from  the  image  points,  and  from  Hi 
and  H2,  the  altitudes  in  the  weak-perspective  geometry.  It  should  be  kept  in  mind  that 
the  altitudes  Hi  and  H2  depend  on  the  specific  imaging  geometry;  that  is.  they  depend 
on  the  pose  of  the  model. 

Equation  2.3.7  gives  the  three  matched  model  points  in  camara-centered  coordinates 
without  having  to  compute  a  rigid  3D  transformation.  This  should  reduce  the  cost  of 
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computing  the  camera-centered  locations  of  these  points,  whicli  will  speed-iii)  recognition 
systems  to  the  extent  tliis  computation  is  important.  Further  advantages  of  the  direct 
method  are  that  it  is  more  intuitive  hecau.se  it  is  more  directly  connected  to  the  geometry, 
and  it  may  he  simpler  to  use. 

It  may  he  worthwhile  to  oh.serve  that  E(|uation  ‘i.ll.  the  ex])ression  for  the  fourth 
]>oint.  can  lie  rewritten  as  a  weighted  sum  of  the  three  image  points; 

nis/n,-})  =  (<^•*‘01  +  'Froi  +  ’'lifJoiHi  —  )  +  -'’U' 

o.'yoi  +  FVoi  +  +  J’oiWi )  -f  Uvi) 


=  (o.ri  -f  i )  -  (o.ro  +  iHiyo-  f>.Vo  -  ilii-i'o)  + 

{S-Vx  —  ~  Hifl2‘  A  If  2  +  —  i-Evo  —  ')  //i.(/o.  + 

(•i’o^.Vo) 


1  —  0  — 

l(H^ 

-  Hi)  1 

Xu 

+ 

-liH^-Hx) 

1  - 

0  — 

J 

.Vo 

0  iHx  ■ 

■ri 

+ 

//.  ■ 

■ri 

-0//2  0 

Pi 

.5//.  . 

Pi 

Let  represent  a  'iD  rotation  matrix  that  rotates  hv  an  angle  9.  Then 

n(sm3)  = 


where 


^0  =  \/(i  -  o  -  sy  +  (i{ii\  -  H2)Y 
=  \/o'^  +  (7^2)^ 
bx  =  -f  {iH,Y 


cos  9,=  ^ 

cosfl,  =  f 

cos9x  =  f- 
K2 


sin  0,  = 

sin^i  =  ^ 

sin  9x  =  ^ 
^2 


(•2.4;i) 

(2.44) 

(2.45) 


(2.4(1) 


Thus,  we  can  view  the  computation  as  a  2D  rotation  and  scale  of  each  image  point 
separately  followed  hy  a  sum  of  the  three.  It  is  important  to  keep  in  mind,  however,  that 
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the  rotations  and  scales  tlieiiiselves  depend  on  the  image  points,  because  of  H\  and  H2. 

When  the  model  is  planar,  the  form  of  Equation  2.42  facilitates  understanding  the 
effects  of  error  in  the  image  points.  Error  in  the  locations  of  the  matched  image  points 
leads  to  uncertainty  in  the  image  location  of  the  fourth  model  point.  Suppose  that  the 
true  locations  of  the  matched  image  points  are  known  to  be  within  a  few.  sa\'  c,.  pixels 
of  their  nominal  locations,  for  i  =  0.  1.2.  Let  i,  and  c,  be  the  true  and  nominal  locations 
of  an  image  point,  for  i  =  0. 1.2.  Then,  for  some  To,  /o  =  Qj  +  to.  where  ||  To  ||=  to.  and 
similarly  for  and  ?2-  Then 

=  ^oR-0q<o  +  +  <^2lLe2'2 

=  'U  (^oR-Sq^o  +  +  (‘i2Rfi2^2) 

When  the  fourth  point  is  in  the  plane  of  the  first  three.  *)  =  0,  so  that  the  scales,  t^o-  and 
62,  and  2D  rotations,  Re^,  Rflj.  and  R02^  ^11  constant  (see  Equations  2.43-2.46).  This 

means  that  the  first  term  in  parentheses  is  just  the  nominal  image  location  of  the  fourth 
model  point.  Since  e^o,  ej,  and  move  around  circles,  the  2D  rotations  in  the  second 
term  can  be  ignored.  Further,  since  these  error  vectors  move  independently  around  their 
error  circles,  their  radii  simply  sum  together.  Therefore,  the  region  of  possible  locations 
of  the  fourth  model  point  is  bounded  by  a  circle  of  radius  6oto  +  ^iti  +S2t2  fFat  centered 
at  the  nominal  point.  By  plugging  y  =  0  into  Equations  2.43-2.45.  we  get  that 

60  =  ll  —  Q  —  /31  ,  =  lo)  ,  I>2  =  l/d|  . 

Assuming  to  =  =  ^2  =  e-  this  implies  that  the  uncertainty  in  the  image  location  of 

a  fourth  point  is  bounded  by  a  circle  with  radius  (|1  —  o  —  /5l|  -|-  |q|  -|-  |/i|)c  and  with  its 
center  at  the  nominal  point,  which  repeats  the  result  given  earlier  by  Jacobs  [Jacobs91]. 

.Although  the  non-planar  case  clearly  is  more  complicated,  since  the  scales  and  2D 
rotations  are  no  longer  constant.  Equation  2.42  may  prove  useful  for  obtaining  bounds 
on  the  effects  of  error  in  this  situation  as  well. 


2.8  Review  of  Previous  Solutions 


There  have  been  several  earlier  solutions  to  the  weak-perspective  three-point  problem, 
notably  by  Kanade  and  Kender  [Kanade83j,  Cyganski  and  Orr  [Cyganski85]  [Cygan- 
ski88],  Ullman  [Ullman86]  [Huttenlocher87],  Huttenlocher  and  Ullman  [HuttenlocherSS] 
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[HutteiilorlierfK)]  [rilniaii89],  and  (iriinson.  Hutleulocher.  and  Alter  [ClrimsoidVia].  All 
tli<'  previous  solutions  compute  the  31)  pose  l>v  going  through  a  31)  rigid  transformation 
or  a  21)  affine  transformation  relating  the  model  to  the  image.  A  21)  affine  transform  is 
a  linear  transform  plus  a  translation,  and  it  can  he  ajjplied  to  any  object  lying  in  the 
|)lafie.  .All  hut  rilman's  and  (wimson.  lluttenlocher.  and  .Alter  s  solutions  compute  an 
afline  transformation  between  the  three  mode!  and  image  j)oints.  .Also,  all  hut  Kanade 
and  Kender's  solution  compute  a  model-to-image  rigid  transformation,  either  via  a  rota¬ 
tion  matri.x'  or  via  Euler  angles. 

.Not  all  of  the  solutions  directly  solve  the  weak-persi)ective  three-i)oint  |)rohlem.  The 
earliest  solution,  which  was  given  by  Kanade  and  Render  in  lt)S3.  applies  Kanade's 
skewed-symmetry  constraint  to  recover  the  31)  orientation  of  a  symmetric,  planar  pat¬ 
tern  [Kanade83].  More  precisely,  Kanade  and  Render  showed  how  to  compute  the  31) 
orientation  of  the  plane  containing  a  symmetric,  planar  pattern  from  a  21)  affine  trans¬ 
form  between  an  image  of  the  pattern  and  the  pattern  itself.  To  apply  this  result  to 
the  weak-perspective  three-point  problem,  the  three  points  can  be  used  to  construct  a 
symmetric,  planar  pattern,  and  a  2D  affine  transform  can  be  computed  from  two  sets  of 
three  corresponding  points.  The  solution  was  shown  to  exist  and  to  give  two  solutions 
related  by  a  reflective  ambiguity,  assuming  that  the  determinant  of  the  affine  transform 
is  positive. 

The  remaining  methods  all  concentrate  on  computing  the  3D  rigid  transform  from  the 
model  to  the  image.  In  1985,  while  pre.senting  a  .system  for  recognizing  planar  objects, 
(N’ganksi  and  Orr  showed  how  to  u.se  higlier-order  mon7eiit,s  to  compute  a  2D  affine  trans¬ 
form  between  planar  regions  [(’yganski85]  [C'yganskiOrrSS].  Civen  the  affine  transform, 
they  listed  expressions  for  computing  the  3D  Euler  angles  from  the  2D  affine  transform^ 
They  did  not,  however,  discuss  how  they  derived  the  expressions. 

The  next  method  is  the  solution  given  by  Ullman  in  1986  [l'llman86],  which  appeared 
again  in  [HuttenlocherST].  The  paper  included  a  proof  that  the  solution  for  the  scale 
factor  is  unique  and  the  sol  ition  for  the  rotation  matrix  is  unique  up  to  an  iidierent  two- 
way  ambiguity.  (This  corresponds  to  the  ambiguity  in  Hi  and  H2  )  A"et  Ullman  did  not 
show'  the  solution  exists.  When  it  does  exist,  Ullman  described  a  method  for  obtaining 
the  rotation  matrix  and  scale  factor. 

In  1988.  Huttenlocher  and  Ullman  gave  another  solution,  and.  in  the  process,  gave 


'The  expres-sions  that  appear  in  [(’yganskiSr)]  contain  typesetting  errors,  but  are  listed  correctly 
in  [f’yganskiSS]. 
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the  first  complete  proof  that  tlie  solution  both  exists  and  is  unitiue  (uj)  to  the  two-way 
ambiguity)  [lluttenlocher88]  [HuttenlochedlO]  {iniman89].  Like  Kanade  and  Kender.  and 
Cyganski  and  Orr.  Huttenlocher  ami  rilmairs  solution  relies  on  a  21)  afline  transform. 
The  solution  itself  is  based  on  algebraic  constraints  derived  from  rigidity,  which  are  usetl 
to  recover  the  elements  of  the  scaled  rotation  matrix. 

The  last  solution,  which  was  published  this  year,  was  develo|)ed  by  (Irimson.  Hut¬ 
tenlocher.  and  .'\ller  for  the  ])urpose  of  analyzing  the  effects  of  image  noise  on  enor  in 
transformation  space  [Clrimson92a].  Towards  this  end.  the  method  facilitates  computing 
how  a  small  perturI)ation  in  each  transformation  parameter  propagates  to  uncertainty 
ranges  in  the  other  parameters. 


2.9  Presentation  of  Three  Previous  Solutions 


The  solutions  discussed  in  the  previous  section  differ  significantly  in  how  tliey  conipute 
the  transformation,  and.  as  a  result,  each  one  can  provide  different  insights  into  solving 
related  problems,  such  as  error  analysis  in  alignment-based  recognition  and  pose  cluster¬ 
ing.  It  seems  useful,  then,  to  present  the  previous  solutions  in  detail,  so  they  conveniently 
can  be  referred  to  and  compared. 

The  first  method  presented  is  Idlnian's  solution,  which  the  first  part  of  this  chapter 
extended.  After  that.  I  give  Huttenlocher  and  ITlman’s  solution.  Lastly.  1  present  the 
method  of  Grimson,  Huttenlocher,  and  ,'\lter.  I  do  not  present  Kanade  and  Kender's 
method  nor  Cyganski  and  Orr’s,  because  Kanade  and  Kender  did  not  directly  solve 
the  weak-perspective  three-point  problem,  and  Cyganski  and  Orr  did  not  detail  their 
solution. 

It  should  be  pointed  out  that  the  presentations  here  differ  somewhat  from  the  ones 
given  by  the  original  authors,  but  the  ideas  are  the  same.  Basically,  the  presentations 
emphasize  the  steps  that  recover  the  .3D  pose  while  being  complete  and  concise. 

In  the  following  presentations,  we  are  looking  for  a  rigid  transform  plus  scale  that 
aligns  the  model  points  to  the  image  points,  hi  all  methods,  we  are  free  to  move  rigidly 
the  three  image  points  or  the  three  model  points  wherever  we  wish,  since  this  amounts 
to  tacking  on  an  additional  transform  before  or  after  the  aligning  one.  For  example,  this 
justifies  the  assumption  made  below  that  the  plane  of  the  model  points  is  parallel  to  the 
image  plane. 
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For  coiisistenc}'.  the  same  notation  as  in  Sections  2.2  and  2.4  is  used  in  the  proofs 
that  follow;  Let  the  model  points  Ire  mo-  i7ii-  tlif*  image  points  be  io-  ii-  ij-  with 

the  respective  distance.',  between  the'  pohits  being  and  Ru  ior  the  model  points, 

and  c/oi.  c/o2.  and  Jn  for  the  image  points. 


2.9.1  Overview 

Initially,  all  three  methods  compute  a  transformation  that  brings  the  model  into  image 
coordinates,  such  that  the  plane  of  the  three  matched  model  points  is  parallel  to  the  image 
plane  and  such  that  /ho  projects  onto  /o-  which  has  been  translated  to  the  origin.  The 
three  methods  then  compute  the  out-of-piane  rotation  and  scale  that  align  the  matched 
model  and  image  points.  In  so  doing,  the  methods  all  end  up  solving  a  biquadratic 
equation. 

In  Ullman's  method,  the  model  and  image  points  are  further  transformed  via  rotations 
around  the  c  axis  to  align  rhi  and  /]  along  the  x  axis.  Then  the  3D  rotation  matrix  for 
rotating  successi\’el\-  around  the  x  and  y  axes  is  expressed  in  terms  of  Euler  angles. 
This  leads  to  a  series  of  three  equations  in  three  unknowns,  which  are  solved  to  get  a 
biquadratic  in  the  scale  factor.  To  get  the  elements  of  the  rotation  matrix,  the  solution 
for  scale  factor  is  substituted  Ijack  into  the  original  three  equations. 

Instead  of  further  rotating  the  model  and  image  points.  Huttenlocher  and  Ullman 
compute  an  affine  transform  between  them,  which  immediately  gives  the  top-left  sub¬ 
matrix  of  the  scaled  rotation  matrix.  Then  by  studying  what  happens  to  two  equal- 
length  vectors  in  the  plane,  a  bicjuadratic  is  obtained.  The  scale  factor  and  the  remaining 
elements  of  the  scaled  rotation  matrix  are  found  using  the  algebraic  constraints  on  the 
columns  of  a  scaled  rotation  matrix. 

Like  iniman  did.  Grimson.  Huttenlocher,  and  Alter  rotate  the  model  further  to  align 
wi  and  /j.  The  desired  out-of-plane  rotation  is  expressed  in  terms  of  two  angles  that 
give  the  rotation  about  two  perpendicular  axes  in  the  plane.  Next.  Rodrigues'  formula, 
which  computes  the  3D  rotation  of  a  point  about  some  axis,  is  used  to  eliminate  the  scale 
factor  and  obtain  two  constraints  on  the  two  rotation  angles.  The  tw'o  constraints  are 
solved  to  get  a  biquadratic  in  the  cosine  of  one  of  the  angles.  Its  solution  is  substituted 
back  to  get  the  other  angle  and  the  scale  factor,  which  can  be  used  directly  by  Rodrigues' 
formula  to  transform  any  other  model  point. 

As  mentioned  in  the  introduction.  UllmanN  solution  is  incomplete  because  it  does 
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not  show  which  of  the  two  solutions  for  the  scale  factor  is  correct:  actually,  the  solution 
is  completed  by  the  result  given  in  Section  2.1.1  of  this  chapter.  Similar  to  I  limans 
nu'thod.  (irimson.  Hnttenlocher.  and  .Alters  solution  has  the  same  drawback  ol  not 
showing  which  solution  to  its  bi<|uadrati<'  is  correct.  Hnttenlocher  ami  I  liman,  on  the 
other  hand,  have  no  .sirch  problem  because  it  turns  out  that  om-  of  the  two  sohrtions  to 
tlu'ir  bi(|itadratic  is  obviously  not  real,  and  so  it  immediately  is  discarded. 


2.9.2  Ullman’s  method 

This  section  gives  oilman's  sohrtion  to  the  weak-])erspective  three-point  problerrt.  The 
main  idea  is  first  to  transform  the  three  model  point. s  to  the  image  i)lane  and  then  stdvt' 
for  the  scale  attd  out-of-plarre  rotation  that  align  the  transformed  i)oints. 

Specifically,  the  model  points  first  an'  rigidly  trart.sforrtted  to  put  th<'  three  ntoth'l 
poirrts  irt  the  irttage  plarre  with  mo  the  origin  of  the  image  coordirrate  systr'm  and 
mi  —  mo  aligrred  with  the  .r  a.xis.  After  rigidly  transforming  the  model  j)oints.  the 
r'esulting  points  can  be  represeirted  by  (0.0.0).  (.ri.0,0).  and  {.r^.i/^-O).  Similarly,  let 
the  image  poirrts  be  rigid  transformed  to  put  fo  al  origin  and  /i  —  io  along  the  .\  a.xis. 
and  let  the  resulting  image  points  be  (0.0.0).  (.ri.0.0).  and  (.r'2. ,r/2- 0). 

Next,  we  break  the  out-of-plane  rotation  into  a  rotation  around  the  .r  axis  by  an 
angle  0  followed  by  a  rotation  around  the  //  axis  by  arr  angle  o.  as  pictirred  in  Fig.  2-8. 
The  corresponding  rotatiorr  matrix  is 


cos  0 

0  .sin  0 

■  1  0 

0 

R  = 

0 

1  0 

0  cos  0 

—  sin  0 

—  sitr  0 

0  cos  0 

0  sm0 

cos  0 

cos  0 

sin  <5>sitr  6 

.sin  0COS  0 

= 

0 

COS0 

—  sin  0 

—  sin  0 

cos  sin  0 

cos  0  cos  0 

.After  rotation  and  scale.  (0.0,0).  (.ri.0.0),  and  (j^.  .(/2- 0)  become  (0.0.0).  (.ri.O.  Ci).  and 
(.r2.  t/2< -2).  respectively,  where  ci  aitd  Z2  are  unkitowrr.  Thus,  we  need  to  find  0.  0.  and 
,s  such  that 


.sR(j',,0. 0)  =  (.j'l.O,  Cl) 
.sR(.rT.  ,i/2. 0)  =  ( J-2,  J/2-  -2) 
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Figure  2-8:  Interpreting  the  out -of- plane  rotation  angles  in  ('liman’s  method. 


E.xpariding  the  first  two  rows  of  R  yields  three  equations  in  three  unknowns: 


S.f  1  cos  O  =  J  ] 

(•2.48) 

sy2  COS  0  =  yi 

(2.49) 

S.V2  cos  0  —  .sj/2  sin  0  sin  ^  =  j-2 

(2..50) 

Fig.  2-8  gives 
tions  2.48  and 
biquadratic  in 

a  graphical  interpretation  of  the  first  two  equations.  Substituting  Equa- 
2.49  along  with  expressions  for  sin  O  and  sin^  into  Equation  2.50  yields  a 
the  scale  factor  .s: 

fl.s'*  —  bs^  -}-  c  =  0. 

(2.51) 

where 

a  = 

j  1  y2 

(2.52) 

b  = 

-F  .t/2^)  +  A  yl)  —  2j-i X2.f].r2 

(2.53) 

c  = 

2  2 

(2.54) 

The  positive  solutions  for 

s  are  given  by 

IbAzVbi^  —  4ac 

~  V  2« 

(2.55) 

In  general  there  can  be  one.  two.  or  no  solutions  for  s.  Ullman  makes  no  further  attempt 
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to  detoniiiiH'  when  or  if  each  solution  arises,  except  to  refer  to  a  unic|ueiiess  proof  he  gives 
ea'Iier  in  the  pa[)er.  1  he  uni((ueness  proof  inijrlies  there  can  be  at  most  one  solution  for 
s.  I>ut  tlo<*s  not  sav  which  solution  if  is  or  whether  it  can  be  either  one  at  different  times. 

Ciiv*  ii  the  rotation  matrix  R  i^^  obtaii>#»d  n'jiue  cos  o  =  and  cosO  =  in  Equa- 
tion  2  47.  One  difficidty  with  this  is  t  hat  we  do  not  know  t he  signs  of  sin  0  and  sin  o:  this 
leaves  lour  (jossibilities  for  the  pair  (sin  sin  o).  In  his  uniqueness  proof.  Ellman  points 
out  that  the  inherent  reflective  aiid)iguify  corresponds  to  multiplying  simultan<x)usly  the 
elenuMits  /’li,  /-ja,  /'rji.  and  r-a  of  R  by  —1.  In  Equation  2.17.  the  signs  of  those  elements 
also  are  inverted  when  both  sinf^  and  sin  o  a?e  multiplied  by  —1.  whicl),  visually,  corre¬ 
sponds  to  reflecting  the  model  points  about  the  image  plane  (Eig.  2-S).  Still,  we  have  no 
way  to  know  which  of  the  two  pairs  of  .solutions  is  correct.  One  way  to  proceed  is  to  try 
both  and  .see  which  .solution  pair  aligns  the  j)oints. 


2.9.3  Huttenlocher  and  Ullman’s  method 


First,  assume  the  plane  containing  the  model  points  is  parallel  to  the  image  plane.  Theii 
subtract  out  /ifo  and  /o  from  the  model  and  image  points,  respectively,  to  align  them 
at  the  origin.  Let  the  resulting  model  points  be  (0.0,0).  (./q.,Vi>0).  and  (.r2.,V2.0).  and 
the  resulting  image  points  be  (0,0).  (j’l.j/i),  and  (.r^.yi)-  At  this  point,  what  is  left  is 
to  compute  the  scaled  rotation  matrix  that  brings  (.ri.  Vi-O)  and  (.r2.t/2.0)  to  (jq.i/i.ri) 
and  (.r2.  ;i/2i -i)-  respectively,  where  cj  and  C2  are  unknown.  That  is.  we  need 

.sR(.r”i,if,,0)  =  (jq,  </,,-,) 

sR(j-2,j/2.0)  =  (•r2-.V2.~2)- 

Letting  /n  =  /12  =  •‘’’ri2-  etc.,  and  focusing  on  the  first  two  rows  of  the  rotation 

matrix,  we  get  two  sets  of  equations: 


(ll-fl  +  /|2,Vl  —  '*'1 

(2.56) 

/ii.r2  +  /12V2  =  •''2 

(2.-57) 

(ii-ri  +  ^22;'/i  =  ;Vi 

(2.-58) 

^21^2  +  hiiji  =  :!/2> 

(2.-59) 
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Figure  2-9:  Frojt'ctiiig  two  ortliogoiial  same-lengtli  vectors  in  Hut tenloclier  and 
Flhnan's  method. 


In  lu 
ll  1  ^22 

s  if  the  deterniinent.  .r|//2  —  I'lih-  ecjuals  zero. 


.  the  top-left  sul)-nialrix  of  the  scaled  rotation  matrix.  .Note  that 


which  give 
this  step  fail 

.Next,  we  make  <i  digre.ssion  to  coii.sider  what  happens  to  two  orthogonal.  e<|ual  length 
vectors  in  the  plane.  Ei  and  .Since  f'"\  and  ar('  in  the  plane,  we  can  apply  the 
svdematrix  just  computed  to  obtain  the  resulting  vectors.  eV  and  E/: 


( I  = 


■  /.I 

l\2 

■  In 

ly2  ' 

I22  . 

u- 

(2  = 

. 

I22  . 

(2 


(2.f>()) 


When  a  model  is  transformed,  c”!  and  undergo  a  rigid  transformation  plus  scale  before 
projection.  .\s  shown  in  Fig.  2-9.  after  transformation  these  vectors  become  +  a”d 
('^2'  +  C2~-  Since  a  scaled,  rigid  transform  preserves  angles  and  ratios  of  lengths  between 
vectors,  and  since  ("j  •  =  0  and  ||  f1  ||=||  ||.  it  must  be  that 

(cV  -f  r,c)  •  (cV  -f  c-2:)  =  0 


f 

u 


=11  e.' 


+e:i 


rir2 


c 


2 

I 


^’i 

E2 


These  two  erpiations  simplify  to 
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where 


A'l  :=  -(T  •  o'  (-.hi  ) 

A-2  =  ll^ril-IIFi'll  (2.W) 

Substituting  for  ^  in  the  second  equation  leads  a  biquadratic  in  ci'. 

ct-Vf-Af  =  0  (2.(i;l) 

The  general  solution  is  _ 

Cl  =  ±y/ (^k-2  ±  . 

Conveniently,  the  inner  discriminant  always  is  greater  than  or  equal  to  zero.  Furthermore, 
since  4k-f  >  0.  the  real  solutions  are  given  by 

Cl  =  +  )/ki  +  -1^’?)  • 

since  otherwise  the  outer  discriminant  is  less  than  zero. 

These  two  solutions  for  ci  give  two  corresponding  solutions  for  C2.  whicii  from  Fig.  2-9 
can  be  seen  to  correspond  to  a  reflection  about  the  image  plane. 

The  solution  for  C2  does  not  work  when  Cj  =  0.  In  this  case. 

C2  -  ±y/-k-2  =  ±^11  r/  |j  -  II  C/  II  ■  (2.6.')) 

This  gives  two  solutions  for  C2.  if  it  exists,  which  can  be  seen  as  follows.  Since  ci  =  0. 
f~l  ends  up  in  the  plane,  so  that  that  the  length  of  t\  is  just  scaled  down  by  s.  whereas 
the  length  of  reduces  both  by  being  scaled  down  and  by  projection.  Consequently. 
II  ^2  II <11  II .  and,  therefore,  C2  exists. 

Given  cj  and  C2,  we  can  recover  two  more  elements  of  the  scaled  rotation  matrix. 
Since  Cl  and  C2  are  in  the  plane,  we  know  that  .sKFi  =  G'  +  CjS  and  SRF2  =  c'2'  +  C2C. 
Focusing  on  the  last  row  of  the  scaled  rotation  matrix,  we  get  the  two  equations  /31  =  Ci 
and  I32  —  f'2- 

At  this  point,  we  have  the  first  two  columns  of  sR.  and.  from  the  constraints  on  the 
columns  of  a  rotation  matrix,  we  can  get  the  last  column  from  the  cross  product  of  the 
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first  two.  In  total,  this  gives 


■ 

/.2 

i(C2/2. 

-  <1/22: 

R  = 

/i. 

/2> 

^(0/12 

-  ! 

<^1 

Cj 

i(/n/22 

■Since  the*  cohnnns  of  a  rotation  matrix  have*  unit  lengtli.  we  know 

.s  =  Jiji  iii  +  (]  =  Jlj,  +  /22  +  <2  • 


(■_>.(){>) 


(2.07) 


Notice  that  the  ambiguity  in  C]  and  C2  inverts  the*  signs  of  the  appropriate  elements  of 
the  rotation  matrix  as  discussed  in  .Section  2.9.2. 


2.9.4  Grimson,  Huttenlocher,  and  Alter’s  method 

(tiimson  et  al.  gave  another  .solution  to  the  weak- perspective  three  point  problem  in 
order  to  get  a  handle  on  how  small  perturbations  affect  the  individual  transformation 
parameters. 

To  start,  assume  the  plane  containing  the  model  points  is  parallel  to  the  image  plane. 
Next,  rigidly  transform  the  model  points  .so  that  rho  projects  to  io  and  /r?i  —  mo  projects 
along  1]  —  io-  Let  11  represent  an  orthogonal  projection  along  the  r  axis,  and  in  general 
let  c'*'  be  the  2D  vector  rotated  ninety  degrees  clockwise  from  the  2D  vector  c.  Then  the 
translation  is  io  —  Hmo-  and  the  rotation  is  about  £  by  an  angle  given  by 

cos  i’  =  itioi  ■  ioi-  sin  i’  =  -luoi  • 

(see  Fig.  2-10). 

At  this  point,  assign  tuoi  =  Wi—Wo.  ino2  =  m2  — Wo-  loi  =  —  <0-  and  /02  =  m2  — mo- 

Also,  consider  the  out-of-plane  rotation  to  be  a  rotation  about  /or  Ly  some  angle  followed 
by  a  rotation  about  /q,  by  some  angle  0.  Let  us  compute  where  the  vwtors  /qi  and  /j}, 
project  to  after  the  two  rotations  and  scale.  To  do  this,  we  use  Rodrigues'  formula:  Let 
represent  a  rotation  of  a  point  p  about  a  direction  f  by  an  angle  r.  Rodrigues' 
formula  is 

^/T  =  cos  Tp  +  (1  —  cos  T)(r  •  p)v  -h  sin  r(r  x  p).  (2.68) 
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Figure  2-10:  After  the  rotation  by  v  in  Crimson.  Huttenlocher.  and  Alter 
method,  the  plane  of  the  model  points  is  parallel  to  the  image  plane.  iTio  project 
onto  /o<  and  —  mo  project.^  tdong  /'i  -  Iq. 


(’sing  the  fornnda.  we  can  comput 

e 

cosG/oi  —  sin  g3 

(2.6( 

sin  0  sin  oioi  T  cos 

-f  sin  cos  ©3. 

Initially. 

7Tioi  was  rotated  about  3  t 

o  align  it  with  /qi-  In  or 

der  for  t  he  scaled 

orthograi)h 

projection  of  rnoi  to  align  with  iui 

.  E(|uation  2.69  implies 

that 

II  II  1 

II  moj  II  cos  o 

_  1 

(2.7( 

/?oi  cos© 

Then 

.snR7x  /.R?  ffioi 

<^0i  ~ 

-  „  '01 
ool 

(2.7 

.sIIRtx  o/'oi 

*01-^  *01’^ 

~  - ^ — (sin^sin  ©iol  +  cos^?o, ) 

ffoi  cos  © 

(2.7; 
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Next,  we  use  the  expressions  in  Eciuations  2.71  and  2.72  to  constrain  0  and  o  such 
that  mo2  projects  along  /02.  When  we  aligned  ;7)ui  and  /qi.  iTiq,  rotated  to  ?no2-  Since 
1T102  no  5  conipoi'.ent  (by  assumption),  we  can  re])resent  Rt^7r)o2  i)y 

R{}2  Oo\  +  /?02  ^’in  /qj  . 

where  ^  is  a  known  angle,  ('onsequently.  the  transformed,  projected,  and  scaled  mo2- 
which  must  equal  /o2-  is 

.sFIR^x  f^o2  fos  +  Rq2  sin  Om ) 

=  Ro2  <'os  .sIlR-x  ^R7^,  (,/oi  )  +  Ro2  sin  ^i-'^HR-x  ,  R7^,  ,^/oi ) 

=  Ro2  <-'os  C.  (  -^foi  I  4-  Ri)2  sin  C  ( ^ — (sin  0  sin  o/oi  +  cos  )  ] 

\Roi  J  \RoiCOSO  / 

doi  Ro2  ,  c  ^  ■  c  ■  •  ZJW  I 

= - ;^(cos,c  <;-os  (5  q.  SJJ,  _j - ^  (;.Oy 

coso/fol  coso/?oi 

Similar  to  Ry  ^,mo2'  've  can  repre.sent  /02  ‘Js 

io2  —  (I02  cosix-voi  +  do2  sin  u-Vqj  . 

where  u,'  is  known.  By  equating  terms  we  get 

(lo\  Ro2  ,  ,  ,  •  /., 

- — - — (cos,;  cos 0  +  .sjii ^ sm  osm  =  cosocosu,'  (2.o>) 

«02  Rm 


dpi  Ro2 

do2  Rttl 


( sin  C  cos  0 )  =  cos  o  sin  , 


1  hese  two  equations  can  be  .solved  to  get  a  bi(|uadratic  in  cos  o; 


sin^u-’cos^  o  —  {C  +  ]  —  '2t  cosvxcos^)  cos^  o  +  f"  sin^  s  =  0. 


where 


Since  Rc  ,,woi  is  aligned  wit)]  i„i.  we  need  cos  o  to  be  positive  so  that  Woi  i)rojects  in 
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the  same  direction  as  ioj.  The  positive  solutions  are  given  by 

1 


cos  O  = 


with 


sin> 


1 


±  \J —  U  sin^  u,'  sin^  ^ 


(•2.77) 


1/  =  -  ( 1  +  r  —  2/  cos  uj  cos  ^ ) . 


This  equation  gives  up  to  two  solutions,  but  Crimson  et  al.  make  no  lurther  attempt  to 
show  which  solutions  exists  when,  except  to  say  the  equation  gives  real  solutions  only  if 
//<  >  0  or 


1  + 

COSiA-’COS^  <  — ~ — . 

(2.78) 

iven  0,  Equations  2.73  and  2.74  provide  B: 

^  sin  tc  COSO 

cos^  = 

/  sm  ^ 

(2.79) 

cos  o(  cos  u;  —  /  cos  ^ ) 

sint^  =  - : — — - 

/  sm  t  sin  6 

(2.80) 

Given  any  model  point  m.  we  can  use  the  computed  angles  along  with  Rodrigues' 
formula  to  find  its  image  location.  In  particular,  once  1n^)  and  i^)  have  been  subtracted 
out,  only  the  scale  and  3D  rotation  are  left.  The  scale  is  given  by  Equation  2.70.  and,  as 
shown  above,  the  rotation  is 


•W 


(•2.81) 


As  with  I'llman's  method  (Section  2.9.2).  we  do  not  know  the  signs  of  sin  0  and  sin  o.  but 
only  (hat  inverting  both  sign.s  .sin7ultaneous]y  corre.sponds  to  the  reflective  ambiguity. 


2.9.5  Summary  of  the  three  computations 

Here  I  summarize  how  each  method  can  be  used  to  compute  3D  pose  from  three  corre¬ 
sponding  j)oints.  To  begin,  transform  the  model  and  image  points  so  that  (1)  the  model 
points  lie  in  the  image  plane.  (2)  iTiq  and  iq  are  at  the  origin  of  the  image  coordinate 
system,  and  (3)  tri^  —  Wo  and  /]  —  io  lie  along  the  x  axis.  Then  use  one  of  the  three 
methods  to  compute  the  scale  factor  and  out-of-plane  lolation.  as  follows: 


51 


CHAPTER  2.  W  POSE  EROM  POINTS  VSISC  WEAK-PERSPEdlVE 


•  IHlinan's  method 

1.  Use  Equations  2. 52-2. -54  to  get  </,  />.  and  e. 

2.  Substitute  «,  b,  and  e  into  Equation  2.55  to  get  .s. 

5.  Calculate  coso  =  ^  and  coaO  = 

sj'i  syj 

4.  ('alciilate  sin  o  —  \J  1  —  cos^  o  and  sin  0  =  \/T— cos^. 

5.  Construct  the  rotation  matrix  R  u.sing  E(|uation  2.47. 

•  Huttenlocher  and  Ullman’s  method 

1.  Solve  Ecjuations  2. -56  and  2.57  for  /u  and  /12.  and  Eciuations  2.58  and  2.59  for 
1 21  and  l22- 

2.  Let  ("J  =  (0.  1)  and  E2  =  (UO).  (Any  orthogonal,  equal-length  vectors  can  be 
used.) 

3.  Use  Ecjuation  2.60  to  get  and  (''2'- 

4.  Substitute  C/  and  t-i  into  Equations  2.61  and  2.62  to  get  and  \<2- 

5.  Substitute  Aq  and  U2  into  Equation  2.64  to  get  C]. 

6.  If  Cl  7^  0.  calculate  C2  =  ^.  Otherwise  get  ci  from  Equation  2.65. 

7.  Use  Equation  2.67  to  get  .s. 

8.  Use  Equation  2.66  to  get  .sR.  Divide  through  by  .s  if  R  is  desired  instead  of 
sR. 

•  Grirnson,  Huttenlocher.  and  .Alter’s  method 

1.  Prom  the  model  (joints.  com|Jute  /^oi.  U^2  and  i^.  and.  from  the  image  (joints. 
com|Jute  doi-  ikn-  and 

2.  U.se  Equation  2.76  to  get  /. 

3.  U.se  [equation  2.77  to  get  cos  o. 

4.  Use  Equation  2.70  to  get  .s. 

5.  Calculate  sin  O  =  \/l  —  cos^  o. 

6.  Use  Piquations  2.79  and  2.80  to  get  cost?  and  sin  6*. 

7.  To  transform  any  (joint  /T.  substitute  coso.  sinci.  cos^.  sin  and  /T  into  Ho- 
drigues'  formula.  Equation  2.68.  to  get  R;7  =  R-^  aV- 
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2.10  Conclusion 

The  vveak-pers()ec'l iv(‘ 1  hre<'-|)oint  |>iol)leni  is  liiiKlaineiital  to  many  a|)pioaches  to  model- 
basecl  recognition.  In  this  chapter.  1  illustrated  the  underlying  geometry,  and  then  useul 
it  to  derive  a  new  solution  to  the  problem,  and  to  explain  the  various  s|)ecial  cases  that 
can  arise:  the  special  cases  determine  when  there  are  zero.  one.  and  two  solutions.  Also. 
1  discussed  earlier  solutions  to  the  problem  in  detail. 

The  new  solution  is  based  on  the  distances  between  the  matched  model  aiul  imag(' 
|)oints.  and  is  used  to  obtain  an  ex|)ression  for  a  direct  alignment  of  a  model  to  an  image'. 
.'\s  a  result,  the  solution  given  here  may  be  easier  to  use.  and.  for  recognitie  •  svstems 
that  repeat  the  computation  of  the  mo<lel  pose  many  times,  may  lx*  more  ef{  lent. 

Furthermore,  tlu'  geometric  approach  in  this  chapter  provides  additional  insights  into 
the  problem.  First,  it  was  demonstrated  that  the  pose  solution  may  be  unstable  either 
when  the  model  points  are  nearly  collinear  or  when  the  plane  of  the  matched  model 
points  is  parallel  to  the  image  plane.  This  property  is  not  particular  to  the  pose  solution 
given  here,  but  is  inherent  in  the  underlying  geometry.  Second,  this  chapter  resolves 
which  solution  to  Fllman's  original  bitpiadratic  is  correct,  and.  specifically,  showetl  that 
the  false  solution  corres|)onds  geometrically  to  inverting  the  roles  of  t  he  model  and  image 
points.  .'\lso.  this  chapter  makes  evident  the  symmetry  of  the  solution  with  n'spc'cl  to 
the  ordering  of  the  points.  In  general,  the  geometric  ap|)roach  has  Ix'en  u.seful  in  gaining 
understanding  of  the  basic  problem,  and  may  prove  useful  for  providing  insights  wlx'ii  a 
related  problem  needs  to  be  solved. 
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Chapter  3 

Uncertainty  in  Point  Features 


As  discussed  in  (’hapter  1.  features  derived  from  an  image  invariably  contain  errors.  The 
approach  in  Section  1.:}  uses  triples  of  matched  point  features  to  generate  hvjwtheses.  and 
then  uses  model  features  such  as  points,  line  segments,  and  curve  segments  for  deciding 
which  hypothe.ses  are  correct.  To  decide  correctness,  the  algorithm  uses  the  matched 
point  triples  to  predict  the  image  locations  of  the  model  features  (step  2b).  Errors  in 
the  locations  of  the  image  points,  however,  lead  to  uncertainty  in  the  predi:'ted  locations 
of  these  model  features,  (’on.sequently.  in  step  2c  of  the  algorithm,  the  hypothesized 
three-i)oint  match  is  used  to  com|)ute  search  regions  for  finding  matches  to  the  model 
features. 

In  the  past,  to  account  for  the  uncertainty.  ])eople  tried  considering  all  image  features 
as  candidate  matches  [CIrimson84];  however,  the  combinatorics  of  such  an  approach  are 
prohibitive  [Crimson90a].  In  addition,  people  tried  looking  for  matches  in  a  region  of 
fixed  size  and  shape  about  each  predicted  feature  [HuttenlocherSS],  but  this  assumes  the 
size  and  shape  do  not  significantly  change.  If  this  assumption  is  wrong,  it  can  lead  to 
correct  hypotheses  being  discarded  and  incorrect  hypotheses  being  accejited;  occurrences 
of  which  are  known  as  /a/.sf  ntgntivf  s  and  false  posilivfs.  respectively. 

Using  a  standard  model  for  error  in  the  image  points  (Section  3.1).  this  chapter  shows 
that  the  shapes  of  the  uncertainty  regions  for  point  features  generally  do  not  change, 
but  tlie  sizes  can  change  considerably.  F'urther,  it  is  demonstrated  that  the  uncertainty 
regions  generally  are  circular  (Section  3.2),  and  a  method  is  given  for  fitting  “uncertainty 
circles"  to  them  (Section  3.4).  In  addition,  the  situations  where  the  uncertainty  regions 
are  not  circular  are  described  (Section  3.3).  Lastly,  the  uncertainty  circles  are  compared 
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to  previous  uncertainty  propagation  techniques  (Section  :h5). 


3.1  Bounded  Error  Model 


T[ie  errors  in  the  image  points  arise  from  several  sources,  including  the  optics  used  to 
obtain  the  image,  the  edge  detector,  and  the  process  for  finding  distinguishing  points 
from  edges.  Fhe  effect  of  these  errors  is  to  alter  the  locations  of  the  image  points.  These 
locations  will  be  altered  by  at  most  some  amount  <.  which  empirically  seems  to  be  about 
five  pixels,  ('ircles  of  radius  (  can  then  be  used  to  bound  the  error  in  the  sensed  or 
nominal  locations  of  the  image  points.  This  approach  to  modelling  error  is  known  as 
a  “bounded  error  model."  and  has  been  used  often  for  performing  robust  recognition 
[drimsoiuSd]  [Baird85]  [Ellis87]  [Cass90]  [.lacobsOl]. 


3.2  Uncertainty  Circles  for  Bounding  Uncertainty 
Regions 

To  see  how  well  uncertainty  circles  do  for  bounding  the  errors  in  the  image  locations  of 
predicted  model  points,  this  section  runs  two  experiments  that  compare  the  true  regions 
to  the  circular  fits.  The  radii  of  the  circles  is  computed  by  taking  the  maximum  distance 
from  the  nominal  point  to  a  point  on  the  boundary.  To  compare  the  regions,  we  need 
a  measure  of  error  between  the  true  region  and  the  approximation.  When  the  circular 
approximations  are  poor,  the  circles  will  badly  over-bound  the  true  regions.  One  measure 
is  what  fraction  of  the  circle  the  true  region  leaves  uncovered.  Let  Af  equal  the  area  of 
the  true  region  and  let  Ac  equal  the  area  of  the  approximating  circle.  Then  the  fraction 
just  mentioned  is  given  by  .  When  the  approximation  is  good,  however,  we  want  to 

know  the  relative  error  from  the  true  value,  which  is  given  by  Using  the  fraction 

of  the  area  when  the  uncertainty  circle  over-bounds  the  true  region  and  using  the  relative 
error  when  it  does  not.  the  error  measure  is 

r  if  .4,  >.4,. 

I  1  otherwise. 
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where  tlie  sign  is  used  to  discriminate  the  two  cases.  This  e.xpression  is  the  same  as 


ma.\(.4£..  .At) 

In  this  e.xpression,  the  ma.x  is  needed  even  tliough  in  theory  .1,-  >  .At.  l)ecause  the  metho<l 
used  below  to  compute  At  can  overestimate  it  a  little  when  the  circular  approximation 
is  good. 

Experiment  1:  Accuracy  of  uncertainty  circles  for  random  models 

If  we  are  using  uncertainty  circles  in  a  recognition  system,  we  wish  to  know  how  often  to 
expect  the  uncertainty  circles  to  be  correct.  In  terms  of  the  error  measure  (Ecjuation  3.1 ). 
we  wish  to  know  what  percent  of  the  time  the  maximum  value  of  the  error  measure  will 
be  at  most,  say  1%.  or  10%.  Conversely,  what  will  be  the  maximum  error  90%  of  the 
time,  9-5%  of  the  time,  or  99%'  of  the  time? 

To  estimate  these  numbers.  I  ran  a  .series  of  trials  to  simulate  an  actual  system  and 
computed  the  error  measure  for  each.  The  percent  of  time  the  error  measure  is  expected 
to  satisfy  .some  criteria  is  estimated  by  the  fraction  of  trials  over  which  it  satisfies  that 
criteria.  For  an  actual  system.  I  consider  an  alignment  algorithm  that  selects  triples  of 
points  from  an  image  and  matches  them  to  triples  of  points  from  a  model.  1  assume 
that  the  points  in  the  image  effectively  arise  at  random,  which  is  reasonable  if  the  image 
contains  significant  clutter. 

Method 

This  experiment  runs  one  hundred  trials  where  a  model  is  projected  into  an  image 
and  the  error  measure  of  Ecjuation  3.1  is  computed  for  each  model  point.  In  each  trial 
a  random  triple  of  image  points  is  matched  to  a  random  triple  of  model  points  taken 
from  a  randomly-generated  model  (see  Appendix  C  for  details).  Fhe  three-])oint  match 
is  used  to  project  the  model  into  the  image,  which  gives  the  nominal  image  locations  of 
the  model  points.  As  described  in  Chapter  2.  except  for  model  points  in  the  plane  of  the 
matched  model  points,  there  are  two  possibilities  for  each  nominal  image  location. 

Using  (  =  5.  the  f-circles  around  the  three  image  points  are  sampled  uniformly  at 
twenty-five  points  each  (see  Fig.  3-3).  Every  triple  of  points  between  the  samjdes  on  the 
uncertainty  circles  is  matched  to  the  three  model  points.  Each  match  is  used  to  compute 
the  image  locations  of  all  the  model  points.  This  results  in  a  set  of  boundary  and  interior 
points  tor  uncertainty  regions.  The  area  of  each  region  is  computed  by  scanning  its  points 
into  an  image  and  counting  the  number  of  pixels  within  the  resulting  image  boundary 
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(this  step  is  explained  further  in  Appendix  I)).  There  are  two  sets  of  boundary  points 
corresponding  to  the  two  weak-perspective  solutions  (Chapter  2).  which  results  in  two 
areas  j)er  uncertainty  legion  (see  Appendix  D). 

(liven  th<'  boundary  points  for  an  uncertainty  region,  the  radius  of  the  corresponding 
uncertainty  circle  is  obtained  by  taking  the  maximum  distance  from  the  nominal  point 
to  a  boundary  ])oint.  For  a  radius  r.  the  area  of  the  circle  is 

RtsuU.s  and  Disciiiision 

Over  the  100  trials,  1 16d  uncertainty  regions  were  tested.  The  average  area  was  583. 
for  the  correct  uncertainty  regions  and  662.43  for  the  approximating  circles.  Fig.  3-1 
shows  a  histogram  of  the  percent  errors  in  the  circular  approximation  (using  the  error 
measure).  The  largest  peak  of  the  histogram  is  at  0.  The  average  i)ercent  error  is  —10.82. 
the  median  is  between  —11  and  —12.  and  the  range  is  [— 35.1 1. 81 .65].  Negative  errors 
occur  because,  when  the  fit  is  good,  the  method  used  to  compute  the  true  regions  may 
actually  overestimate  them  a  little  (.Appendix  D).  The  large  negative  errors  are  all  for 
situations  where  the  circles  are  very  small  (radii  between  five  and  eight  pixels);  the  error 
measure  is  sensitive  to  these  cases  because  of  the  slight  overestimation  in  the  method  for 
computing  regions.  The  errors  of  particular  concern  are  large  positive  errors,  which  arise 
when  the  uncertainty  circles  are  large  overestimates.  .As  will  be  seen  next,  such  errors 
occur  rarely. 

By  summing  up  to  an  index  in  the  histogram  and  then  dividing  by  the  total  number 
of  entries,  we  get  the  fraction  of  time  that  the  error  was  less  than  that  index.  Doing  so 
gives  that  96.T3V(  of  the  time  the  error  between  the  true  region  and  the  approximation 
was  less  than  FX.  and  97.91%  of  the  time  the  error  was  less  than  10%.  ('onversely.  the 
maximum  error  90%  of  the  time  was  1%.  9-5%  of  the  time  it  was  also  1%.  98%  of  the 
time  it  was  10%.  and  99%  of  the  time  it  was  51%.  These  results  suggest  that  uncertainty 
circles  are  generally  ver\’  accurate. 

Experiment  2:  Accuracy  of  the  uncertainty  circles  for  the  telephone  model 

The  experiments  on  random  models  indicate  that  for  most  objects  the  circular  approx¬ 
imations  are  good.  To  see  how  accurately  random  models  reflect  true  objects.  I  took  a 
model  of  typical  object  for  the  system  to  handle,  a  telephone  (Fig.  3-2),  and  re-ran  the 
same  set  of  trials.  The  telephone  model  was  built  by  hand.  The  model  points  are  listed 
in  Table  3.1.  The  first  eight  jjoints  were  measured  in  inches  on  the  actual  object,  and 
the  last  two  were  added  to  make  the  appearance  of  the  model  more  complete. 
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X 

\ 

z 

1 

0 

0 

0 

2 

9 

0 

0 

3 

9 

4.625 

0 

4 

0 

4.625 

0 

5 

0 

0 

1.625 

() 

3.5 

0 

3.5 

7 

3.5 

4.625 

3.5 

8 

0 

4.625 

1.625 

9 

9 

0 

3.5 

10 

9 

4.625 

3.5 

Table  3.1; 


Points  on  tho  t^leplione  model. 
i\[etho(l 

The  method  is  the  same  as  in  Experiment  1.  except  that  the  telephone  model  was 
used  at  every  trial  instead  of  a  new,  random  model  (Fig.  3-3). 

Rf>iiilts  and  nificus!^ion 

For  100  trials  with  the  phone  model.  1092  uncertainty  regions  were  generated.  The 
average  area  was  49.3. 59  lor  the  correct  uncertainty  regions  and  450.13  for  the  approxi¬ 
mating  circles.  Notice  that  this  time  the  average  area  for  the  overestimates  is  lower  than 
for  the  exact  areas.  This  is  becau.se.  as  mentioned  earlier,  the  method  used  to  compute 
the  true  regions  can  overestimate  them  a  little  when  the  fit  is  good  (.Appendix  D).  4  his 
effect  turned  out  to  be  stronger  than  the  overestimate  in  the  circular  fit.  becau.se  very 
few  of  the  circular  fits  were  poor. 

The  resulting  histogram  for  the  phone  model  is  shown  in  Fig.  3-4.  overlayed  with 
the  histogram  for  random  models.  The  distributions  are  similar,  with  the  phone  model 
having  a  smaller  range  of  values.  The  average  percent  error  is  —10.72.  the  median  is 
between  —11  and  —12.  and  the  range  is  [—31.37.27.02].  Observe  that  the  average  and 
median  errors  are  very  close  to  tho.se  for  random  models. 

For  the  phone  model.  98.01'X  of  the  time  the  error  between  the  true  region  and  the 
approximation  was  less  than  FT.  and  99.08%  of  the  time  it  was  less  than  10%.  .As  before, 
the  maximum  error  90%  and  95%  of  the  time  was  1% .  This  time,  however,  the  maximum 
error  98%  was  also  1%.  Further,  99%  of  the  time  the  maximum  error  was  10%  instead 
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Figure  •F3;  Propagated  uncertainty  regions  for  a  telephone  model  (hotli  solutions  are  shown). 
The  two  small,  unfilled  circle.s  are  sampled  c-ciroles  (the  third  one  is  occhided).  01)serve  that 
the  propagated  regions  appear  circular. 


of  51‘X.  So  it  appears  the  circular  fits  work  better  for  the  s])ecific  model  of  a  telephone. 


3.3  Cases  Where  Errors  Are  Greatest 

This  section  looks  closely  at  the  cases  where  the  errors  are  large.  Doing  so  may  help 
to  infer  the  situations  where  circles  are  poor  approximations,  which  is  im])ortant  for 
knowing  when  the  uncertainty  circles  badly  overestimate  the  true  regions.  Also,  knowing 
when  the  approximation  breaks  would  allow  for  avoiding  these  cases  or  handling  them 
specially. 

Of  t  he  one  hundred  trials  on  random  models,  there  were  two  which  had  errors  greater 
than  25%.  For  each  trial.  F'ig.  d-5  displays  the  uncertainty  region  and  uncertainty  circle 
that  had  the  largest  error.  For  one  trial,  the  largest  error  was  78.8%.  The  model  [loint 
with  this  error  had  extended  affine  coordinates  (2.057. —2.227.  .01568)  (extended  affine 
coordinates  were  defined  in  Chapter  2).  For  this  trial,  the  three  matched  model  points 
were  ( 15. —71. —1 12),  (—48,57.-7).  and  (-5.59.-70).  and  the  three  matched  image 
points  were  (296.  IKi).  (152.250).  and  (120.556).  More  interestingly,  the  angles  between 
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these  points  are 


2-1.39°,  107.03°,  48. .^8°  for  the  model  points 

24.15°.  107.99°.  47. 80°  for  the  image  poims 

Notice  that  these  angles  are  very  close.  (Jeometrically.  this  means  the  plane  of  the  model 

points  is  almost  parallel  to  the  image,  a  situation  which  Chapter  2  warned  was  unstal)le. 

For  the  other  trial,  the  largest  error  was  81.7*X.  and  the  extended  affine  coordinates  of 
the  corresponding  model  point  were  (  —  .7151.  1.404.  .002413).  The  three  matched  model 
points  were  (9. —  19. 170).  (—3.35.0).  and  (—83.2.57).  The  three  matched  image  points 
were  (272.  191).  (34.  198).  and  (101.314).  The  angles  between  the  points  are 

35.40°.  80.50°.  58.10°  for  the  model  points 

•34.04°.  84.28°.  01.07°  for  the  image  points 


Again  the  angles  are  very  close,  which  means  the  plane  of  the  matched  model  points  is 
almost  parallel  to  the  image.  These  cases  suggest  that  we  should  be  cautious  with  the 
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F'’igui'o  Largest  error  for  the  telephone  model;  27.0V? 


uncertainty  circles  when  the  model  plane  is  nearly  parallei  to  the  image. 

The  true  uncertainty  regions  pictured  in  Fig.  2-5  have  strange  shapes.  The  concavity 
ill  the  larger  region  is  due  to  the  interior  of  the  region  not  being  filled,  which  is  a  result  of 
sampling  only  the  boundaries  of  the  error  regions  of  the  matched  image  jioints.  Ignoring 
t  he  concavity,  there  is  an  almost  straight  line  bounding  part  of  the  region.  The  .source  of 
this  line  is  the  way  the  uncertainty  regions  are  computed.  As  explained  in  .Apjiendix  D. 
the  jiropagated  points  are  separated  into  two  groups  in  order  to  handle  the  two  solutions 
for  [)os<'  (see  Chapter  2).  The  points  are  separaterl  according  to  whether  Hy  or  Hz  from 
the  |)ose  solution  is  positive  or  negative.  For  Fig.  Tb.  if  all  the  points  from  both  .solutions 
were  plotted,  then  a  smoothly  curved  boundary  for  the  entin'  region  could  be  ex|)ected. 

For  the  ])hone  model,  there  was  only  one  trial  out  of  the  one  hundred  which  had 
errors  greater  than  25*/?.  and  the  largest  error  in  this  case  was  27.0'T.  The  uncertainty' 
region  and  uncertainty  circle  are  shown  in  Fig.  :T().  The  extended  affine  coordinates  of 
the  point  with  largest  error  were  (.02475.  .:f()  11 . —.000032).  and  the  angles  between  the 
model  and  image  points  were 
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32.80°.  57.20°.  90.00°  for  the  model  points 

77.07°.  G1..36°.  81.57°  for  the  image  points 

Two  of  the  angles  are  close,  but  not  as  close  as  they  were  for  random  models.  At  the 
same  time,  the  worst -case  error  is  not  nearly  as  bad  as  for  random  models. 

Fig.  3-7  displays  the  regions  with  the  largest  negative  errors  for  the  trials  on  random 
models  and  the  phone  model.  Recall  that  negative  errors  arise  because  there  may  be 
extra  pixels  counted  along  the  boundaries  of  the  true  regions  when  computing  the  areas 
(see  Appendix  D).  From  the  figure,  negative  errors  can  be  as  small  as  —35%  and  the 
approximation  visibly  be  good. 

In  summary,  we  can  infer  that,  in  an  alignment  system  that  tries  many  or  all  pairs 
of  point  triples  for  aligning  a  model  to  the  image,  situations  with  large  errors  could 
be  avoided  by  checking  whether  the  angles  between  the  points  are  similar.  However, 
this  may  lead  to  relying  on  an  arbitrary  threshold.  C’onsequently.  it  perhaps  would  be 
better  to  handle  these  cases  specially  by  using  another  technicjue  such  as  that  used  in  the 
experiments,  namely,  to  sample  extensively  and  then  walk  the  boundaries  of  the  resulting 
regions. 
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3.4  Computing  Uncertainty  Circles 


(Jiven  that  circh's  ceiitf'n'cl  at  the  noiniiial  jjoints  ap|)roximate  well  th<*  uiicertaiiity  region 
boundaries,  all  that  is  needed  is  to  compute  the  radii  of  the  circles.  Since  only  one 
boundary  point  is  needed  to  compute  the  radius,  a  straight  forward  a{)proach  is  to  sample 
points  froiti  the  error  circles  around  the  matched  image  jroints  and  take  the  maximum 
distance  from  the  nominal  point  as  the  radius.  This  process  will  be  efficient  if  few  sam|)le 
points  are  retiuired. 

Experiment  3:  Using  fewer  sample  points  for  random  models 

To  see  how  few  samjjle  points  are  needed,  this  experiment  tests,  for  various  numlxus  of 
])oints.  i>,  and  for  a  series  of  trials,  the  percent  of  time  (fraction  of  trials)  that  the  error  in 
using  n  points  instead  of  twenty-five  is  less  than  some  limit.  Twenty-five  is  the  number 
of  points  u.sed  in  the  last  two  experiments. 

Method 

A  series  of  one  hundred  trials  are  run  using  random  image  triples  matched  to  random 
model  triples  from  randomly-generated  models,  using  the  same  method  as  in  Exi)eri- 
ment  1.  for  each  trial,  the  error  circles  around  the  matched  image  points  are  sampled 
uniformly  at  twenty-five  points  and  ten  points.  For  each  propagated  uncertainty  region, 
the  error  in  using  the  smaller  number  of  samples  to  using  twenty-five  .samples  is  comi>uted. 
This  is  repeated  for  nine,  eight,  and  seven  samj)le  points  as  well. 

Rf.sults  and  Discussion 

The  results  are  shown  in  Table  3.2.  It  may  be  observed  that  the  percentages  do  not 
strictly  decrease  as  fewer  sample  points  are  used.  This  can  be  explained  by  the  fact 
that  the  circles  around  the  image  points  are  sampled  uniformly,  so  that  using  different 
numbers  of  sampled  points  can  give  different  samples  on  the  circles.  Consequently,  when 
the  percentages  are  close,  there  may  be  cases  where  fewer  sample  points  do  better.  Nev¬ 
ertheless.  this  effect  should  be  small.  Notice  that  the  average  percent  error  does  indeed 
increa.se  monotonically. 

We  can  use  Table  3.2  to  pick  a  reasonable  number  of  points  for  sampling  the  image 
error  circles.  From  the  table,  if  we  permit  S/<  error,  then  using  eight  sample  points 
instead  of  twenty-five  can  be  expected  to  be  accurate  over  of  the  time.  Also,  the 
average  error  in  using  eight  points  is  very  small  (1.137%). 

A  better  feel  for  how  accurate  is  the  use  of  fewer  sample  points  is  given  by  statistics 
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1% 

2% 

3% 

4% 

5% 

6% 

ave 

max 

min 

10 

72.06 

93.12 

98.61 

99.13 

99.22 

99.56 

0.684 

21.97 

-..34 

9 

.57.27 

84.16 

98.00 

98.70 

98.87 

99.04 

1.031 

31.60 

-.37 

8 

55.61 

75.98 

90.43 

98.87 

99.39 

99.57 

1.137 

17.12 

-.49 

7 

46..30 

63.27 

77.89 

91.65 

97.91 

98.61 

1.670 

25.53 

-.31 

Table  3.2: 

Percentage  of  time  error  was  less  than  for  different  numbers  of  samjde  points.  Also  shown  are 

the  average,  ma.ximum,  and  minimum  percent  errors  over  all  the  trials.  Results  are  based  on  1149 
propagated  uncertainty  regions  using  random  models. 


ave 

max 

min 

ave  percent 

max  percent 

min  percent 

10 

.05 

2.55 

-.05 

.344 

11.67 

-.17 

9 

.08 

3.87 

-.03 

.521 

I7..30 

-.18 

8 

.08 

3.24 

-.05 

.573 

-.24 

7 

.13 

4.21 

-.02 

.844 

13.70 

-.16 

Table  3.3: 

Differences  in  radii  for  different  numbers  of  sample  points.  Results  are  based  on  1149  propagated 
uncertainty  regions  using  random  models. 

on  the  radii,  shown  in  Table  3.3.  From  the  table,  the  average  difference  in  the  radii  for 
eight  sample  points  was  .08  pixels,  and  the  worst  case  difference  was  3.24  pixels.  Relative 
to  the  radius  for  twenty-five  points,  the  average  difference  is  .5739(.  and  the  maximum 
difference  is  8.96%. 

Experiment  4:  Using  fewer  sample  points  for  telephone  model 

Method 

This  experiment  is  the  same  as  Experiment  3,  except  that  the  phone  model  is  used 
instead  of  random  models. 

Results  and  Discussion 

Tables  3.4  and  3.5  give  the  results.  From  Table  3.4.  we  again  can  use  eight  points 
to  limit  errors  to  5%  over  99%  of  the  time.  From  observing  both  tables,  it  appears  that 
using  fewer  sample  points  works  slightly  better  with  the  phone  model  than  with  random 
models. 

To  illustrate  the  use  of  uncertainty  circles.  Fig.  .3-8  shows  an  example  of  the  propa- 
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TX 

2'X 

3‘X 

4'X 

SX 

GY, 

ave 

max 

min 

10 

66.40 

91.91 

99.-37 

99.-55 

99.64 

99.64 

0.726 

8.55 

-..33 

9 

60.83 

84.19 

98.02 

99.46 

99.55 

99-55 

0.913 

1.3.19 

-..33 

8 

62.15 

91.91 

99.37 

99-55 

99.64 

99.64 

0.981 

11.76 

-.30 

7 

46.46 

64.24 

80..32 

92.81 

98.38 

99-55 

1-532 

12.80 

-.33 

Table  ;1.4: 

Percentage  of  time  error  was  less  than  1%-6‘X’  for  different  numbers  of  sample  points.  Also  shown  are 
the  average,  maximum,  and  minimum  percent  errors  over  all  the  trials.  Results  are  based  on  1113 
uncertainty  regions  using  the  telephone  model. 


ave 

max 

min 

ave  percent 

max  percent 

min  percent 

10 

.05 

0.69 

-.05 

.365 

4.37 

-.17 

9 

.06 

1.30 

-.02 

.4.59 

6.83 

-.17 

8 

.07 

1.12 

-.03 

.494 

6.07 

-.25 

7 

.10 

1.23 

-.03 

.772 

6.62 

-.16 

Table  3.5; 

Differences  in  radii  for  different  numbers  of  sample  points.  Result  s  are  based  on  1113  uncertainty  regions 
using  the  telephone  model. 
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gated  uncertainty  circles,  where  eight  sample  points  were  used.  The  three  smallest  circles 
correspond  to  the  assumed  errors  in  the  matched  image  points,  which  in  this  exami)le 
were  matched  correctly.  For  the  unmatched  model  jmints.  the  other  circles  show  the 
regions  to  be  searched  for  matching  image  points.  The  self-occluded  model  {joints  were 
removed  beforehand.  Still,  some  of  the  remaining  corner  points  are  occluded  by  other 
objects,  and  the  uncertainty  regions  provide  a  means  to  rea.son  that  this  is  so  after  a 
relatively  small  amount  of  search  in  the  image. 

Notice  that  the  sizes  of  the  ]jro|jagated  uncertainty  regions  vary  considerably  for 
different  model  points.  Consequently,  an  approach  that  relies  on  fixed-sized  error  bounds, 
as  in  [Huttenlocher88].  can  lead  to  correct  matches  being  missed  (when  the  bounds  are 
too  small),  and  incorrect  matches  being  accepted  (when  the  bounds  are  too  large  and 
include  spurious  image  points). 
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•Method 

.Models 

Type 

Uncertainty  Circles 

Random 

Solid 

1- 

O 

O 

Uncertainty  Circles 

Phone 

Solid 

.003747 

Bounding  Polygons 

Random 

Solid 

.008279 

Exact  Circles 

Random 

Planar 

.002783 

Table  ;b(): 

L'xpected  selectivities  of  point  features. 

3.5  Expected  Selectivity  of  Point  Features 

The  probability  that  a  feature  distributed  randomly  over  an  image  falls  into  an  uncer¬ 
tainty  region  is  known  as  the  sf  If  elicit  ij  of  the  region  [Grimson92b].  This  cjuantity  is 
tiseful  for  analyzing  the  reliability  of  recognition  systems  [Grimson92a]  [Grimson92b]. 
including,  as  will  be  seen  in  Chapters  5  and  ().  the  system  proposed  here.  For  point 
features,  the  selectivity  is  the  area  of  the  region  divided  by  the  image  area.  where 
the  area  of  the  region  takes  into  account  the  uncertainty  in  the  unmatched  image  jroints 
by  e.xpanding  the  propagated  region  outwards  by  t. 

In  the  past,  the  concept  of  selectivity  has  been  aj)plied  to  alignment  where  the  models 
are  flat  [Grimson92b].  and  also  to  alignment  with  solid  models  but  using  a  different  un¬ 
certainty  propagation  technique  [Grimsou92a].  When  the  models  are  flat,  the  propagated 
uncertainty  regions  can  be  computed  exactly.  It  would  be  interesting  to  see  how  much 
the  chance  of  a  false  positive  increases  from  planar  to  solid  models.  Also,  it  would  be 
useful  to  know  how  the  uncertainty  propagation  technique  used  here  compares  to  the  one 
in  [Grimson92a].  We  can  use  the  expected  selectivity  to  make  these  comparisons. 

Experiments  5  and  6:  Expected  selectivity  of  point  features 

Method 

To  compute  the  expected  selectivity,  I  re-ran  1000  trials  of  the  same  type  as  in  Ex¬ 
periments  3  and  4.  except  five  was  added  to  each  radius  before  computing  the  area,  in 
order  to  account  for  expanding  the  uncertainty  region  outwards  by  c  =  5  pixels. 

Results  and  Discussion 

Using  random  models  with  eight  sample  points  over  1000  trials  gave  11349  propagated 
regions  with  average  area  973.25  square  pixels.  Using  the  phone  model  with  eight  sample 
points  over  100  trials  gave  11085  propagated  regions  with  average  area  979.78  square 
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pixels.  For  an  image  of  size  451  x  57().  the  resulting  selectivities  along  with  tho.se  for 
[(irimson92a]  and  [(>rimson92h]  are  shown  in  Table  3.6.  The  expected  selectivity  for  the 
uncertainty  circles  is  about  half  that  lor  [Crim.son92a].  winch  implies  that  the  uncertainty 
circles  should  give  significantly  better  performance.  Furthermore,  it  appears  that  the 
selectivities  of  solid  models  are  only  slightly  greater  than  for  planar  ones.  We  can  infer 
from  this  that,  when  point  features  are  used,  recognizing  solid  objects  with  alignment  is 
a  only  a  little  more  sensitive  to  false  positives  than  reiognizing  planar  objects. 
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Chapter  4 

Uncertainty  in  Line  Features 


The  |)rececling  chapter  dealt  with  uncertainty  in  the  predicted  locations  of  point  features 
(step  2c  of  the  alignment  algorithm  of  Section  l.d).  A  set  of  distinguished  points,  howexer. 
usually  is  not  reliable  at  identifying  a  model  in  an  image.  ( 'onseciuejitly.  recognition 
systems  often  use  more  extended  features  such  as  line  segments  for  verification  (Bolles82] 
[CioadSd]  [LoweSb]  [.AyacheSh]  [HoraudST]  [HuttenlocherbO].  This  chapter  extends  the 
uncertainty  analysis  of  the  preceding  chapter  to  line  features  (Section  T1 ).  Furt  hermore, 
a  formula  is  derived  for  selectivity  for  line  features  (Section  4.2).  Tlie  selectivity  for  lines 
is  demonstrated  to  be  significantly  le,s.s  than  for  points  (Section  4.2). 


4.1  Line  Uncertainty  Regions 

Section  3.2  showed  how  to  compute  uncertainty  circles  to  bound  the  propagated  uncer¬ 
tainty  in  predicted  model  points.  We  can  use  this  result  to  bound  the  uncertainty  in 
predicted  model  line  segments.  First,  for  each  model  line  segment,  calculate  the  uncer¬ 
tainty  circles  for  its  endpoints.  Next,  if  we  ignore  fragmentation  and  partial  occlusion,  an 
overestimate  of  the  set  of  image  line  .segments  that  could  match  a  model  line  .segment  is 
given  by  the  set  of  all  line  segments  connecting  pairs  of  points  in  the  two  circles.  To  then 
allow  for  some  fragmentation  and  occlusion,  we  would  also  accept  any  sub-segment  of 
one  of  these  line  segments.  We  can  find  all  candidates  for  a  given  model  segment  by  first 
gathering  all  image  line  segments  that  li#*  entirely  within  the  uncertainty  region  formed 
by  the  uncertainty  circles  and  their  common  outer  tangents  (see  Fig.  4-1).  Then  we  will 


keej)  only  the  line  segments  that  ran  he  extended  to  intersect  both  of  the  uncertainty 
circles. 


4.2  Selectivity  of  Line  Features 

The  selectivity  of  a  line  uncertainty  region  is  (he  chance  (hat  a  spurious  line  .segment 
randomly  tails  into  it.  Ideally,  the  line  selectivity  could  he  estimated  hv  the  chance 
that  the  endpoints  of  a  random  line  segment  fall  within  the  point  uncertainty  regions 
of  a  predicted  model  segment  s  endpoints.  With  fragmentation  atid  occlusion,  however, 
the  endpoints  of  the  corresiKUiding  image  segment  may  not  appear  in  tho.se  regions.  To 
allow  lor  either  endpoint  to  be  occluded,  the  last  chapter  treated  every  model  point 
independently.  By  so  doing,  at  least  one  of  the  endpoints  is  retjuired  to  he  unoccluded. 
In  addition,  the  constraint  from  the  orientation  of  a  model  .segment  is  lost.  Instead  of 
looking  for  endpoints,  we  can  look  for  pieces  of  the  predicted  model  segments,  as  described 
in  .Section  4.1.  If  pieces  of  line  .segments  are  used,  which  still  constrain  the  orientation 
and  partially  constrain  the  length,  the  .selectivity  for  line  segn)ents  can  he  expected  to 
he  much  less  than  for  points. 


4.2.1  Non-overlapping  uncertainty  circles 

This  section  considers  the  ca.se  in  which  the  uncertainty  circles  for  the  endpoints  do  not 
overlap,  which  is  tlie  most  common  situation.  Consider  an  image  segment  of  known 
length  and  orientation.  There  is  a  .set  of  translations  that  place  the  segment  within 
the  image.  The  line  selectivity  equals  the  fraction  of  these  translations  that  place  the 
.segment  within  the  line  uncertainty  region.  As  a  note,  the  set  of  translations  of  a  line 
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Figure  4-2;  Region  of  translations  with  orientation  constraint  and  rectangular 
upper  hound. 


segment  of  known  length  and  orientation  is  the  same  as  its  configuration  .spncf  [Lozano- 
PerezST],  since  a  translation  determines  the  position  of  every  point  on  the  line  segment. 
The  configuration  space  of  an  image  .segment  with  respect  to  an  uncertainty  region  can 
be  obtained  by  shrinking  the  region  along  the  segment's  orientation  and  by  its  length. 

Examples  of  the  constraint  from  an  image  segment's  orientation  are  illustrated  by  the 
shaded  regions  in  Fig.  4-2.  The  figure  shows  two  cases,  distinguished!  by  the  orientation 
of  the  image  segment  relative  to  the  orientation  of  the  common  outer  tangent,  which 
from  Fig.  4-3  is  given  bv 

0,  =snr'— (4.1) 

As  shown  in  Fig.  4-4.  the  orientation  of  an  image  segment  within  the  uncertainty  region 
is  bounded  by  the  orientations  of  the  common  crossed  tangents  of  the  uncertainty  circles. 
Letting  6^  be  the  maximum  allowed  orientation  of  a  candidate  image  segment,  from  the 
figure 


O2  =  sin 


.  _,  /?  +  r 
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Figure  i-3:  Orientation  of  the  eoinnion  outer  tangents  to  the  eireles. 


L 


Figure  4-4;  Orientation  of  the  common  crossed  tangents  of  the  circles.  Tliis  is  tlie 
maximum  possible  angle  of  a  line  segment  with  an  endpoint  in  each  circle. 

Note  that  6i  exists  iff  L  >  H  —  r.  and  f/2  exists  i^T"  L  >  H  +  r.  If  the  uncertainty  circles 
do  not  overlap,  then  L  >  H  +  r. 

Starting  from  the  region  of  translations  with  orientation  constraint,  a  set  of  transla¬ 
tions  with  length  constraint  also  is  obtained  by  shrinking  the  shaded  region  in  Fig.  4-2 
by  the  length  of  the  image  segment.  The  area  of  the  region  can  be  computed  by  moving 
the  image  segment  perpendicular  to  its  orientation,  as  shown  in  Fig.  4-o.  parameterized 
by  u.  The  area  is  given  by  summing  the  distances  between  (.ri.  j/i)  and  i-rj.iji)  over  the 
range  of  u.  Let  (  be  the  length  of  the  image  segment.  .Appendix  FLl  shows  the  area  is 
given  by, 

j"  +  L  cos  0  +  yj —  {u  L  sin  6)^  —  \/r^  —  (hi 

if  L  sin  0  <  R  A  r. 
ot  herwise. 


0 


(4.3) 


where  (/max  involves  a  solution  to  a  quadratic  equation  whose  coetficients  are  given  by 
complicated  expressions. 

('omi)uting  the  line  segment  selectivity  with  this  formula  is  messy,  and  so  instead  I 
compute  a  close  overestimate,  f’ig.  4-0  shows  a  rectangular  box  which  can  be  used  to 
bound  the  range  of  translations.  For  comparison.  Fig.  4-2  shows  the  rectangular  box 
surrounding  each  corresponding  line  uncertainty  region.  From  Fig.  4-0.  the  base  of  the 
rectangle  is  R  +  i  -\-  L  Further,  the  height  of  the  rectangh'  is  2r  for  the  top  picture 

where  0  <  and  R  4  r  —  L  sin  0  for  the  bottom  picture  where  Oy  <  0  <  Observe 

that  for  an  image  segment  of  length  (  to  fit  in  the  rectangle.  (  must  b('  less  than  or  etpial 
to  the  base.  R  +  r  +  LcosO.  .After  shrinkiiig  the  rectangle  along  the  ba.se  by  f.  the  area 
of  the  region  is 


.4  = 


(/?  +  /•  + A  cos  6»-02r  if  61  e  [0. 6>,].  (  <  R r  +  LcosO. 

{R  +  r  L  cos  0  —  (){ R  r  —  L  sin  9]  \{  9  ^  [6^i .  92].  <  <  R  r  -\-  i  cos  9. 
0  otherwise. 


Note  that  R  +  r  —  L  sin  ^  >  0.  since  9  <  92  =  sin  ’ 


(4.4: 


With  respect  to  the  image.  Fig.  4-7  shows  that  the  area  of  translations  for  the  same 
image  segment  is 

.4/  =  ( w  —  6 cos ^/)(//  —  ( sin  9i)  (4.5) 

The  selectivity  of  a  random  line  segment  of  known  length  and  orientation  is 

In  general,  there  will  be  several  line  segments  that  fall  within  a  line  uncertainty 
region,  and  the  line  segments  will  have  different  lengths  and  orientations.  To  account 
for  orientation,  we  can  assume  that  random  line  segments  are  equally  likely  to  fall  at 
any  angle.  Then  we  can  integrate  the  formulas  for  A  and  Ai  over  their  resj)ective  ranges 
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of  allowable  orientations  to  get  volumes  of  allowable  jmsitions  of  a  random  line  segment 
(with  known  length).  Integrating  the  two  area  e.xjjiessions  in  Equation  1.1  over  an 
arbitrary  range  [a-'i.u-'2]  gives  two  eorrespouding  volume  expressions  (see  .\i)p<‘ndix  K.’i); 

ej  ( a.’| .  a.’2 )  ~  (  /f  "E  /'  —  ( ).ir(  ui.’2  —  )  ”  —/’/.( sin  u^'2  —  sm  )  (  4  .(> ) 

('2(0.’] ,  u.,’2 )  =  ( li  r  —  ( ){  H  r )('^’2  —  'j-'i )  “E  (  R  “E  i'  —  I  )L(cos  u,')  —  eos  a,'j ) 

+  (/?  *E  I') L(si\\ <ju 2  —  sm  tt-’i )  —  ~ /.^(sm^  i^'2  —  sin^  ..i-’i )  ( 4 .  i ) 

From  Equation  4.4.  the  range  of  0  is  divided  into  two  intervals  at  f/  =  6^.  .\lso  in 
Equation  4.4.  the  length  of  the  image  segment  constrains  the  range  of  orientations  such 
that  f  <  /?  -f  / '  +  I  cost?,  or  etj  ui  valent  ly.  cost?  >  LiUi+Ll^  q,-  0  <  o.  where 


o  =  cos 


(  —  (  R  r) 

z 


(4.8) 


Note  that  o  <'xists  iff  /?  +  ?•  —  Z,  <  f  </?  +  ;•  -E  The  first  inecjuality  holds  since  the 
circles  do  not  overlap,  and  the  second  must  be  true  for  the  image  segment  to  fit  in  the 
uncertainty  region  (Fig.  4-1 ).  From  the.se  constraints,  the  volume  \’  that  corres])onds  to 
the  area  .4  in  Fxiuation  1.4  is  given  by 


■  r,(0.o)  ifo<t?i.  /</f-Er+/.. 

ri(0.t?i)  +  e2(6»i.o)  ift?i<o<t?2-  /  < /?  +  r  +  A. 

r,(0.t?,)  +  r2(t?,.f?2)  ift?2<o.  /</f +  /+/.. 

.  0  otherwise. 


(l.h) 


Integrating  .4/  (Ihjuation  l.o)  from  f?/  =  — to  0/  =  -j'l  gives 

V,  =  _  2f(<r  -E  h)  +  (4.10) 


I'he  selectivity  ecpials  ~. 

ri.eseetiuat  ions  assume  that  the  length/  of  t  he  image  line  segment  is  known.  It  would 
b(>  convenient  to  integrate  out  tin*  length  as  I  did  lor  orientation,  but  in  real  images  it  is 
not  fair  to  assume  that  all  lengths  are  equally  likely.  One  possibility  is  to  measure  the 
distribution  of  image  segment  lengths  over  a  large  s<'t  of  typical  images,  and  integrat(' 
over  tin'  distribution.  .\  simpler  a|)proach  is  to  measure  the  av(‘rag<'  length  ol  an  image 
s«'gment  and  use  tlu’  average  length  for  /  in  th<'  abov('  ecpiatioiis.  .Mternat iv<4y.  it  may 
be  i)ossible  to  <-stimat('  the  percentage,  say  <».  of  a  mode'  segment  that  is  brok'ni  uj)  by 
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tli('  featuif'  dot<Ttor;  tliis  gives  ^  =  (1  —  o)/.  [(Jrinisondl]. 


4.2.2  Overlapping  uncertainty  circles 

Occasionally  the  uncertainty  circles  may  overlap,  either  by  intersection  or  inclusion.  1  he 
uncertainty  circles  intersect  if  H  —  r  <  L  <  H  A  r.  In  this  case  B2  is  undeiined.  unless 
L  =  RA  r  (Eciuation  1.2).  Also,  o  will  be  undefined  if  /  <  R  A  r  —  I.  (Kquation  1.8).  but 
in  this  situation  the  length  constraint  is  reached  after  0  >  7r/2.  do  avoid  r('dundanc\’.  we 
are  only  interested  in  orientations  of  the  image  segment  that  are  in  tlx*  range  [0. 7r/2j.  So 
for  convenience  we  define  o  to  be  7r/2  whenever  (  <  R  A  r  —  L.  As  with  the  situation  of 
non-overla[)ping  uncertainty  circles,  there  are  two  cases  for  the  height  of  the  rectangle, 
depending  on  whether  the  orientation  of  the  image  segment  is  less  than  or  great('r  than 
()^  =  .s/n“'  (see  Fig.  4-8).  In  addition,  however,  there  are  two  cases  for  the  ljas(' 

of  the  rectangle  (Fig.  4-8).  depending  on  whether  the  orientation  is  less  than  or  greater 
than 

B\  =  7r/2  -  Ox  =  cos-'  (4.11) 

There  ar<‘  two  l)asic  rides  for  computing  the  height  and  base  of  the  rectangle:  ( 1  )  When 
0  <  Ox.  use  2c  for  the  height;  otherwise  use  R  A  r  —  Ls\\\0.  (2)  When  0  <  0\.  use 
RAi'  A  I.  cos  ft  for  the  base;  otherwise  use  2R.  Tliese  two  rules  lead  to  four  area  formulas: 


(i\  =  {R  A  r  A  I-  cos 0  —  ()‘lr  (4.12) 

(t  2  =  ( R  A  1'  A  1.  cos  0  —  ()(R  A  1—  L  sin  0)  (1.18) 

(i:x  =  (2/f-f)2r  (4.14) 

a.x  =  (‘IR  —  ()(R  A  r  —  Ls'nxO)  (4.15) 


lo  get  the  corresponding  volume  formulas,  we  need  to  integrati*  the.se  formulas  over  the 
range  of  0.  Notice  that  the  first  two  formulas  apjiear  in  fxjuation  4.4:  consecjiient ly. 
Ci(u,i  ,u,’2 )  and  (’2  (a.’ I...!,,;)  are  given  by  F<|uations  4.()  and  1.7.  respectively.  From  Ap- 
piMidi.x  F.2.  w<'  have  that 


. .<-'2 )  —  ( 2/f  —  /  )2c(u,’2  —  »A.’i )  (4.  lb) 

.  .^’2  )  =  (2/f  —  f  )( /f  -)-  c)(vt’2  —  .i-’l  )  +  (2/i‘  —  t  )/dcoSu;2  ~  COSu-’i  )  (4.1  () 
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Figure  4-S:  ( 'ases  where  uncertainty  circles  intersect.  The  orientation  of  the  image 
segment  increases  from  picture  a  to  picture  e.  Let  ?’  be  tlie  radius  of  the  smaller 
circle,  R  be  the  radius  of  the  larger  circle.  L  be  the  distance  between  the  center 
points.  0  be  the  orientation  of  the  image  segment,  b  be  the  base  of  the  rectangle, 
and  h  ))e  the  height  of  the  rectangle.  Then 


a.  0  <  0]. 

h  =  /?  -f  r  +  L  ‘  OS  0. 

h  =  2r 

1,.  0  = 

h  —  R  r  L  cos  0, 

h  = 'Ir  =  R  r  —  L 

c.  Oi  <0<  irj-l  -  0\. 

h  =  /f  -f  ;•  +  L  cos  0, 

1! 

1 

d.  0  =  TT /2  —  0] . 

I)  —  R  +  r  Y  L  cos  0  =  2R. 

li  =  R  +  r  —  L  sin  0 

e.  7r/2  -  61,  <  0. 

h  =  2R. 

h  =  R  A  r  —  L  sin  0 
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Rules  1  and  2  apply  while  0  <  o:  otherwise  the  image  segment  is  too  long  to  fit  in 
the  rectangular  l)ox.  However,  this  constraint  only  applies  if  o  <  because  as  soon  as 
0  >  O',  the  base  of  the  bounding  rectangle  does  not  change  an\  more  (sef"  cases  d  and  e 
in  Fig.  1-8).  Therefore,  if  o  >  O',  the  length  of  the  image  .segment  does  not  (onstrain 
the  range  of  orientations,  in  which  case  the  maximum  orientation  of  the  image  segment 
is  7r/2  since  the  uncertainty  circles  intersect. 

.\s  a  final  constraint,  there  exists  some  volume  of  translations  as  long  as  (  <  R+r-^-L. 
because  then  the  image  segment  fits  in  the  rectangular  box  when  0  =  0.  which  is  when 
the  orientation  of  the  image  segment  is  thesajiieas  the  orientation  of  the  model  segment. 
Otherwise  the  volume  of  translations  is  zero. 

Putting  these  constraints  together  with  the  volume  exiiressions. 

'(’,(0.0)  ^fo<0l.0'^.  /<RAr  +  L. 

i'i(0. 0i )  -j-  t>j(0i.  (i>)  0 1  ^  o  ^  O'j .  /</?-)-('  -j-  . 

i'i(O-Oi) -t  i'2(0,.0'j)  +  r^iO'f.Tr/'J)  if  Oj  <  O',  <  o.  I  <  R  +  r  +  L. 

vl{fi.0'^)  +  v:^{0'^.n/‘l)  \f0',<o<0,.  /</?+(•+/..  ^  '  ' 

(0.  O', )  +  ^'siO', .  )  +  t'4{0] .  ?r/2)  it  O',  ^  0]  ^  o.  I  ^  R  A  A  L. 

0  otherwise. 


If  the  circles  overlap  but  do  not  intersect,  then  the  smaller  circle  is  contained  in  the 
larger,  as  in  Fig.  4-9.  In  this  case.  L  <  R  —  r.  After  shrinking  by  I  along  the  base,  the 
rectangle  in  the  figure  has  area  (2R  —  f)2r.  Integrating  this  expression  gives 


V- 


27rr(2/?-f)  if  f  <  2/?. 
0  ot  herwise. 


(1.19) 


When  the  uncertainty  circles  overlap,  the  select ivities  for  lines  may  be  larger  than 
for  points.  This  is  in  part  because  for  lines  we  did  not  insist  that  the  endpoints  be 
unoccluded.  In  addition,  when  the  circles  overlap  the  rectangular  upper  bound  is  not  as 
tight  an  estimate.  Since  we  are  using  line  features  to  improve  on  points,  vve  could  prevent 
lines  from  doing  worse  by  instead  using  the  select  ivities  of  the  endpoints  whenever  their 
average  selectivity  is  less  than  the  line  selectivity.  In  effect,  this  insists  that  the  endjtoints 
be  unoccluded  if  the  predicted  model  edge  is  short  enough  that  the  endpoint  uncertainty 
regions  overlap. 
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4.2.3  Summary 


Ciiven  a  model  line  segment,  we  can  compute  its  selectivity,  fi.  as  follows.  Let  r  and  R 
he  the  radii  of  the  two  uncertainty  circles  for  the  endpoints  of  the  line  segment,  such 
that  r  <  H.  r  and  /?  can  he  computed  using  the  technique  given  in  Chapter  Z  or  else, 
if  the  models  are  planar,  using  the  known  analytic  solution.  Next,  let  L  he  the  distance 
between  the  centers  of  the  two  circles,  and  let  ('  he  the  expected  length  of  a  random  lin^ 
segment  in  tlie  image.  Define 


.  u,’-2  ) 


{R  +  r  —  ()2r(u;2  —  cv’i )  +  2rL(sin  u-'i  —  sin  u-’i ) 

( R  “hr  —  ( ]( R  f')(  '^2  —  '*'1  —  ()  L(coh  —  f  os’  “-'i ) 

+(/f  +  r)Z,(sinu,’2  —  sinu;i)  —  -Z.'^(sin^u,’2  —  sin^w-'i) 

(2R  —  C2r{^2  —  ^’i ) 

{2R  —  (){ R  +  r){u22  —  )  +  (2R  —  ()L{cosu02  —  cos^u-’i ) 


If /?  +  r  <  L.  let 


0, 


sm 


-1 


R-r 

L 


O2  =  sin  ' 


RYv 

L 


0  =  cos 


(^) 


Otherwise,  if  R  —  r  <  L  <  R  —  t\  let 


0^ 


=  sin 


-I 


R  -  r 
L 


O', 


cos 


-1 


R  —  r 
L 


0  = 


cos-'  .  if  f  >  /f  +  r- A. 

n'/’i  otherwise. 
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8() 


Next .  A  R  L. 


\=>  I 


Otlierwise  if  R 


2 


Otherw 


f’l 

i(O.o) 

ii  o  <  0], 

/ 

< 

R 

T  r 

-1- 

/.. 

'’1 

+ 

I'liO, 

•  O) 

VI 

0 

VI 

2.  / 

< 

R 

+  /• 

-1- 

L. 

(’1 

+ 

.^2) 

if  0,  <  o. 

/ 

< 

R 

+  r 

+ 

L. 

0 

ot  herwise. 

r 

VI 

VI 

R 

+  /•- 

(’1 

i(O.o) 

if  o 

< 

Ox. 

O'x- 

1 

< 

R 

r  + 

L 

f’l 

+ 

Viidx 

.o) 

if  Ox 

< 

o 

VI 

/ 

1  ' 

/ 

< 

R 

+ 

+ 

L 

(’1 

l'2{0x 

A) 

+  vAB\.ttI2) 

if  Ox 

< 

O'x 

<  o. 

/ 

< 

R 

+ 

/•  + 

L 

('1 

+ 

vAB'x 

.tt/-. 

T 

if  0\ 

< 

o 

<  0 

1  ■ 

1 

< 

R 

+ 

r  -h 

/. 

('1 

do.^;) 

+ 

I'Afi'x 

+  7r/2) 

if 

< 

Ox 

6 

VI 

1 

< 

R 

+ 

r  -f 

L 

0 

otherw 

ise. 

R-r. 

7ti  {2R  -  /) 


if  (  <  2R. 

otherwise. 


Finaliv. 


\  ’/  =  TTwh  —  2(  (»’  +  /))  +  f"’ 

\- 

"  =  T7 


4.3  Expected  Selectivities  of  Line  Features 

To  compare  the  effect  of  line  segments  versus  points,  the  next  experiment  estimates  the 
expected  selectivity  of  line  features  for  the  telephone  model.  The  expected  selectivity  for 
random  models  should  be  similar. 

Experiment  7:  Expected  selectivity  of  line  features  for  the  telephone  model 

Mfihod 

To  compute  the  expected  .selectivity.  1  used  the  formula  given  in  the  last  section.  1  ran 
a  series  of  the  same  trials  from  Experiments  5  and  6  when  the  selectivity  of  point  features 
was  coni[)uted.  For  each  trial.  I  used  each  pair  of  uncertainty  circles  that  corresponds 
to  a  line  segment  in  the  telephone  model  (Fig.  3-2)  and  computed  the  line  segment 
selectivities.  This  was  repeated  for  various  lengths  of  the  average  image  line  segment  and 
for  various  amounts  of  fragmentation,  q. 
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Length 

/' 

5 

.001618 

10 

.001577 

20 

.001491 

30 

.001396 

40 

.00128(i 

50 

.001166 

60 

.001033 

70 

.0009125 

80 

.0008085 

90 

.0007057 

100 

.0006231 

Table  I.i; 

Expected  select!  vilies  of  line  features  for  various  lengths  of  an  image  segment,  usitig  the  teleplioiie  model. 


o 

0.00 

.000617 

0.25 

.001017 

0.50 

.001311 

0.75 

.001550 

1.00 

.001750 

Table  4.2: 


Eixpected  selectivities  of  line  features  for  various  amounts  of  fragmentation,  o .  using  the  telephone  model. 


Rf suits  and  Discussion 

For  1000  trials,  the  selectivities  of  9560  line  uncertainty  regions  were  computed  and 
averaged.  Tables  4.1  and  4.2  give  the  results.  As  expected,  the  selectivities  for  lines  are 
much  less  than  for  points  (compare  to  Table  5.6).  For  the  telephone,  we  can  see  that  the 
largest  selectivity  using  line  features.  .001750.  is  less  than  half  the  selectivity  using  point 
features.  .00.3722. 
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Chapter  5 

Sensitivity  to  False  Positives 


There  are  a  number  of  important  questions  we  would  like  to  answer  which  depend  on 
the  selectivity  of  a  model  feature.  In  particular,  given  that  there  is  occlusion,  what 
percent  of  the  total  length  of  the  mode!  features  must  be  matched  in  order  to  keep  the 
probability  of  a  false  positive  less  than  some  limit?  How  does  this  percentage  vary  with 
the  numbers  of  model  and  image  features?  .Also,  how  many  image  features  can  there  be 
before  the  probability  of  a  false  positive  exceeds  some  limit,  that  is.  how  much  clutter 
can  the  system  withstand? 

Clrimson  et  al.  have  shown  how  to  use  the  exi)ected  selectivity  of  the  uncertainty 
regions  to  answer  the  above  questions  [Grimson91]  [Grimson92a]  [Grimson92b].  and  so 
I  will  apply  their  analysis  here.  Let  J7  be  the  expected  selectivity,  let  .s  be  the  number 
of  unmatched  features  in  the  image,  let  rn  be  the  number  of  unmatched  features  in  the 
model,  and  let  77/  be  the  number  of  point  features  in  the  model  that  are  u.sed  for  gener¬ 
ating  hypotheses.  .Assuming  that  the  .s  unmatched  image  features  occur  independent  1\- 
and  at  random,  the  probability  of  at  least  one  image  feature  appearing  in  a  proj^agated 
region  with  selectivity'  JF  is 

p=i-(]-77r  (5.1) 

The  probability  of  exactly  A-  of  the  rn  propagated  regions  having  at  least  one  random 
feature  is 

?.  =  ('")//(! -P)'"-'  (-5.2) 

The  probability  of  at  least  k  of  the  rn  propagated  regions  having  at  least  one  random 
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f)U 

tVatiire  is 


u\-  is  tlip  i)rot)ability  of  a  false  positive  of  size  k .  If  ue  match  a  fixed  image  tiii>le  to  all 
possible  model  tti|jles.  the  probability  that  at  least  ooe  of  the  matches  leads  to  a  fals*' 
positive  of  size  k-  is 

=  1  -  ( 1  -  ^ 

5.1  Limits  on  Scene  Clutter 

A  recognition  scheme  based  on  extended  model  features  will  siiHc'r  from  fals(>  positives 
if  a  scene  becomes  extremely  cluttered.  It  would  be  useful,  then,  to  ktiow  how  much 
clutter  a  recognitioti  system  cati  accommodate  before  the  i)robability  of  a  false  [)ositi\<'  is 
significant .  W’e  can  use  Equation  o.  I  to  estimate  this  limit .  To  allow  for  ])artial  occlusion, 
let  /  be  the  fraction  of  model  features  that  must  lx*  matched  to  loM'p  th<‘  ])robability  of 
a  false  positive  at  most  b.  where  ^  is  a  preset  limit.  Substituting  m  f  for  k-.  we  want  to 
find  tin'  tna.ximum  .s  such  that  <„,/  <  This  iin'quality  can  be  solv(*d  nunu'rically  to  get 
the  maximum  .s. 

Table  5.1  shows  the  resehs  for  t  =  .001  (the  numbers  for  ^  =  .01  ainl  =  .0001 
are  similar).  Real  images  c?-  easily  contain  as  many  as  -500  features.  Tin'  limits  for  tin- 
uncertainty  j^ropagation  teciinic|ue  of  [ClrimsonO’ia]  are  very  low.  .\lthough  the  numbers 
are  greatly  improved  using  uncertainty  circles,  it  is  only  when  line  .segiin'uts  are  used  that 
numbers  of  features  are  in  the  range  of  images  with  std)stantial  amounts  of  scene  clutter. 


5.2  Accepting  a  Partial  Match 

When  the  extended  features  of  a  model  are  used  for  verification,  we  would  like  to  know 
what  percent  of  tlie  extended  features  must  be  matched  before  we  can  sto])  looking  for 
more  matches.  We  can  use  Equation  5.5  to  set  a  threshold  on  this  percentage  such  that 
the  chance  that  a  false  positive  will  arise  is  less  than  a  preset  limit.  Sj)ecifically.  given  a 
three-point  match,  can  compute  the  minimum  /  such  tliat  where  is  preset. 

Table  5.2  shows  the  results  for  line  segments.  For  comparison,  the  recognition  system  of 
[Hutteidocher88]  used  /  =  .5  as  a  threshold  on  the  percentage  of  the  model  to  verify; 


j.:i.  (VSCH'SIOS 


Method 

a 

/  =  0.25 

0.50 

0.75 

Line  1  iicertainty  Regions 

0.00 

101 

5:17 

1200 

Line  Lncertainty  Regions 

0.25 

102 

•Ml 

7(.;{ 

Line  Lncertainty  Regions 

0.50 

79 

205 

592 

Line  Lncertainty  Regions 

0.75 

07 

221 

500 

Line  Lncertainty  Regions 

1.00 

5*) 

19S 

415 

Lncertainty  Circles 

M 

97 

210 

(i’rimson92a 

15 

d.l 

95 

Table  •).l; 

Approximate  limits  on  the  nmiiher  of  sensory  features  for  different  amounts  of  fragmentation  n  and 
for  different  fractions  /  of  unocrluded  mode!  filatures.  Table  is  tor  <  =  .j.  =  UOl,  for  line  segments 

ID  =  id'  =  '200  (line  uncertainty  regions),  and  for  points  id  =  107  and  id'  =  200  (uncertainty  circles  and 
[(irimson02a]). 


o 

o 

11 

.001 

.0001 

0.00 

.30 

.38 

.11 

0.25 

.19 

.51 

.5-1 

0.50 

.57 

.00 

.02 

0.75 

.03 

.00 

.08 

1.00 

.07 

.70 

.72 

Table  T'i 


Predicted  termination  thresholds  for  different  amounts  of  occlusion  o.  and  for  different  limits  on  the 
fal.se  positive  probability.  Table  is  for  (  —  a.  id  —  in'  =  200.  and  .s  =  oOO. 


this  agrees  with  the  table  when  the  amount  of  fragmentation  is  n  =  ‘25*/. 


5.3  Conclusion 

The  expected  selectivities  of  model  features  can  be  used  to  estimate  two  important  cptan- 
tities.  The  first  is  a  limit  on  the  number  of  spurious  features  there  can  be  before  the 
likelihood  of  false  positive  becomes  significant.  With  such  a  limit,  we  can  tell  in  advance, 
given  a  model  and  an  image,  whether  the  recognition  system  is  likely  to  succeed  in  finding 
the  model  in  the  image. 


(’U.\rii:i{  1.  SESsinvn  v  to  tm.st  msniVTs 


Ihv  sccijiid  (jiiantity  is  a  I  lii*‘s|u>l<l  on  tin*  lanccnt  afi,t‘  ol  niodnl  i<“alur<'>  H)  iiiatili. 
Such  a  ihrc'sliold  can  la-  list'd  a(ti\<'l\  hy  a  vcriticat  ion  system  to  cut  sliort  the  search  lot 
mat  clu's. 


Chapter  6 

Likelihood  of  a  Hypothesis 


In  this  chapter.  I  give  a  criterion  which  can  l)e  used  to  rank  a  hy])Othesis  of  three  matched 
model  and  image  points,  according  to  how  likely  it  is  of  being  correct.  This  step  is  similar 
in  purpose  to  the  quick  check  use<(  by  Huttenlocher  at  tin*  l)eginning  of  his  \enhcatioft 
stage  [HuttenlocherSS].  Huttenlocher  used  simple  heuristics  to  filter  hy[)otheses.  \vl)erea.^ 
here  I  utilize  the  uncertainty  propagation  analysis  to  rank  hypotheses  formally  bas<'d  on 
a  probabilistic  model. 

At  the  point  where  likelihoods  are  assigned  (step  2d  of  the  algorithm  of  Section  1.2). 
the  alignment  system  has  hypothesized  a  [)airiug  between  three  model  and  image  points, 
and  the  basic  cjuestion  is  whether  or  not  the  pairing  is  correct.  To  make  this  deter¬ 
mination.  the  system  looks  for  additional  matches  to  conhrm  the  three  suggested  ones. 
I  sing  the  hypothesis,  the  extended  model  features  (points,  line  segments,  segments  of 
curves)  are  transformed  and  projected  into  the  image,  d  hen  the  correct  search  regions 
are  computed  and  searched  for  additional  matches.  Once  the  additional  matches  ha\(‘ 
been  collected,  line  segments  that  are  nearly  collinear  and  have  proximate  op])osite  end¬ 
points  should  be  combined.  Also,  curve  segments  should  be  coml)ined  if  they  a[)[)ear 
broken.  Given  a  set  of  candidate  image  features  for  each  predicted  model  feature,  wv 
wish  to  estimate  how  likely  it  is  that  the  hypothesis  is  correct. 
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6.1  Formula  for  the  Likelihood 


To  ronipute  the  likelihood,  assume  in  general  that  all  features  in  the  image  that  do  not 
come  from  the  model  ari.se  at  random.  In  truth,  surh  features  arise  from  clutter  in  the 
scene,  occluding  objects,  and  noise;  so  1  am  assuming  the  features  these  events  introduce 
effect i\el\’  occur  at  random.  This  assumption  has  been  made  before  for  analyzing  the 
verification  stage  of  recognition  and  has  yielded  accurate  results  [Clrimsonbl].  In  addition. 
I  assume  that  none  of  the  uncertainty  regions  overlap.  Let  M  be  the  event  that  the 
particular  matches  for  the  model  features  were  found,  and  let  //  be  the  event  that  a 
given  three-point  match  is  correct.  Then  the  probability  that  the  matches  arose  when 
the  model  was  present  is  p{M\H).  Similarly,  the  probability  that  the  matches  arose  at 
random  is  the  probability  that  the  matches  aro.se  when  the  hypothesis  is  wrong,  which 
is  p(M\H).  However,  we  are  interested  in  the  probability  of  H  given  the  event  M .  From 
Hayes'  rule. 


p{H\M) 


p(M\H)p(H)  _  p(M\H)p{H)_  _ 

P(M)  p{M\H)p(H)+p{M\H)p(H) 


1  + 


P(A/|W)/  1 
p(M\Hpp{H) 


(6.1) 


Notice  that  wo  also  need  to  compute  />(//).  the  a  priori  probability  that  the  three 
point  match  is  correct.  Let  H,„  be  the  event  that  the  three  matched  model  points  are 
visible,  and  let  //,  be  the  event  that  the  three  matched  image  points  were  produced  by 
the  three  model  points.  Then  p{H)  =  p(H,\H„i)p{Km)-  H  "a*  have  information  about 
self  occlusion,  we  may  be  able  to  estimate  p{  H„,)  for  different  triples  of  model  points. 
Otherwise  we  can  assume  that  the  model  is  transparent,  in  which  ca.se  />(//„, )  is  the  same 
for  triples  of  the  model,  and  hence  equals  the  probability  that  the  model  appears  in  the 
image. 

.As  for  p{  Hi\H„  ),  this  is  the  probability  that  the  three  model  features  project  to  wit  hin 
the  error  bounds  of  their  corresponding  image  features.  We  could  estimate  this  off-line 
for  every  triple  of  model  features  by  sampling  the  viewing  sphere  and  computing  the 
fraction  of  viewpoints  from  which  the  projected  model  jioints  can  be  scaled,  rotated  in 
2D.  and  translated  in  2D  to  lie  within  the  uncertainty  regions  of  the  three  iinage  points. 

Alternatively,  we  could  estimate  p( //,  |//„, )  at  run-time,  using  the  pose-space  analysis 
in  [Crimson92a].  More  simply,  we  could  assume  that  for  the  most  part  po.se  space  is 


()./.  FORMl’LA  FOR  THF  LIKELIHOOD 


unilonnly  clistribiitecl.  Then  />(//, |//,„)  is  the  probability  that  any  point  in  pose  sjiace 
gives  a  triple  consistent  with  the  image  points.  Since  there  is  a  j)oint  in  pose  space  for 

ev<'ry  image  triple,  this  probability  is  .  where  <  is  the  error  bound  for  the  matched 

imag('  points  and  .4/  is  the  area  of  the  imagt'. 

We  still  need  to  determine  As  mentioned.  p(M\H)  is  the  chance  the  matches 

occurred  at  random.  Let  r  ecjuaf  the  number  of  unmatched  image  feat ures.  Further,  let 
//,  denote  the  selectivity  of  region  /?,.  and  r,  be  the  number  of  features  found  in  /?,.  for 
>  =  L- . (Selectivity  was  defined  in  Section  4.5.)  .\lso.  let 


h 


'N+i  =  ~  H 

i=:l 

(6.-->) 

+ 

11 

(6.4) 

From  the  assumi)tion  that  the  regions  do  not  intersect,  //t+j  is  the  selectivity  of  the 
background.  For  non-intersecting  regions,  the  chance  of  C)  features  landing  in  /?].  r-i 

landing  in  /?2 . /•*.  landing  in  /?/t.  and  c/t+i  landing  in  the  background  is  p\‘  p?  ' '  ‘  ,'I+V  • 

The  number  ol  ways  to  choose  i'i,r-2 . features  from  r  is  given  by  the  multinomial 

coefficient . 

(  '■  )  =  — _ . 

. rt+i/  c,!r2!---rA+,! 

so  that 

p{M\n)=(  '  (6.4) 

.Next,  assume  that  ii  the  hypothesis  is  correct,  then  the  model  features  the  svstem  found 
matches  for  were  not  actually  occluded.  Then  we  get  p(M\H)  by  just  subtracting  one 
teature  from  every  propagated  region: 


p{M\H)  = 


\C|  —  1 .  —  ] - .  /•<•  —  1 .  /’A+i 


I'k-  I>K  +  \ 


Dividing  Equation  6.4  by  6.5. 


(6.5) 


p{M\H)  r] 

p(M\H)  (r-F)!r,/VTA.''‘''^  ■ 


(6.6) 
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6.2  Modified  Formula  for  the  Likelihood 


E(iuatioii  6.1  with  E<iuatioii  6.6  give.s  the  correct  likelihood  according  to  the  assuiiijit ions, 
hut  did  tho.se  assnni[>tion.s  give.s  us  ve  want?  .Accoriliiig  to  the  ronnula,  when  an  uncer¬ 
tainty  region  is  small  relative  to  the  .size  of  the  image,  the  chance  that  a  hypothesis  is 
correct  increases  as  the  nmnber  of  matched  features  in  a  region  goes  uj).  This  is  becaii.se 
it  is  unlikely  for  more  and  more  features  to  randomly  fall  in  the  same  small  region.  The 
|)roblem  is  that  it  is  unclear  that  this  behavior  is  desired.  When  there  happen  to  be 
many  features  in  a  region,  then  there  probably  e.xists  some  image  event  that  violates  the 
assumption  that  the  features  arose  at  random.  This  event  most  likely  is  not  due  to  the 
object  we  are  seeking,  in  which  case  we  would  not  want  the  probability  of  the  hyi)othesis 
to  increase  with  the  number  of  fc£itu’'es  in  the  region. 

.A  safer  approach  is  to  not  use  the  actual  numbers  of  features  in  the  regions,  but  only 
the  fact  that  potential  matches  exist.  For  this  a])proach.  I  re-define  M  to  he  the  t'vent 
that  matches  exist  in  those  same  uncertainty  regions.  .As  before,  we  assume  that  the 
model  features  repre.sented  by  M  are  not  actually  occluded,  so  that  p{.\I\H)  —  1.  By  so 
doing,  some  hypothe.ses  will  be  ranked  higher  than  they  should.  If  a  threshold  is  used 
to  take  the  best-ranking  hypotheses,  then  there  simply  will  be  more  hypotheses  to  verify 
later.  With  p(d/|//)  =  1.  Equation  6.1  becomes 


p{H\M) 


1 

lAp{M\H)(\/p(H)-\) 


(6.7) 


In  this  formula.  p(M\H)  is  the  probability  of  a  ravdoni  conspiracy,  that  is.  the  probability 
that  at  least  one  random  image  feature  falls  in  every  region  represented  by  .1/.  The 
likelihood  of  this  happening  is  the  sum  of  the  probabilities  of  all  the  ways  random  features 
can  fall  in  the  regions.  For  ;■  uniformly-distributed  features,  the  chance  that  /q  fall  in 

region  Hi.  v-i  in  EIz . c*.  in  is  given  by  Equation  6.4.  which  hap])ens  to  be  for  the 

old  p{M\H).  As  before,  let  =  1  ~  P‘-  Also,  define 


k 

rk+i{rur2, - n.)  =  r  -  ^ 

!=1 


I  will  abbreviate  r^.+  i(rt.  r^. . . . .  ri)  by  but  keep  in  mind  that  7q..(.i  is  a  function 

while  is  constant.  Summing  over  all  po.ssible  values  of  the  r,'s.  for  i  =  1.2 . L. 
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t  he'  cliance  of  a  raiuloni  conspirac\'  is 


_  J-A-+1  r-k  +  l-r 

ri  =  \  r>  =  \ 


r-ri  - 


1-2 - 


E 


>'k+\ 


f‘V6? 


6?+i 


(G.S) 


Tliis  formula  involves  a  large  number  of  eomputaiions  of  the  expression  fiom  Equa¬ 
tion  (i.4.  The  number  of  computations  is  exponential  in  anel  k,  where  /■  is  the  number 
of  unmatched  image  features  and  L  is  the  number  of  predicted  model  feature's  for  which 
potential  matches  exist.  Appendix  F  derives  a  recurrence  relation  that  computes  one 
minus  the  same  result.  Let  S\,  S2,  ■  ■  ■  ^  Sk  be  the  “sizes"  of  the  uncertainty  regions,  and 
let  Si  be  the  "size"  of  the  image.  For  points,  the  sizes  are  gives  by  the  areas  of  the 
uncertainty  regions,  and  for  lines  the  sizes  are  given  by  the  volumes.  The  recurrence  is. 


9r(5/;5j . Sk 


where 


Qr  { Sr.  Sk  )  ( 1  -  ({r{S  I  ~  Sk\  Si 

QASr.Si) 

0 


.  .5;._]))  +  qASj-.Si . Sk-i) 

if  A'  >  1, 

if  A'  =  1  and  5]  <  5/. 
otherwise. 

(6.9) 


QASr.S) 


(6.10) 


This  expression  has  repeated  sub-problems  at  every  recursive  call,  such  that  only  one 
additional  subproblem  is  generated  at  each  level.  At  the  bottom  level,  there  are  A-  ex¬ 
pressions  to  evaluate,  namely  Q(S],Si).  for  i  =  1 . A’,  which  are  the  only  times  ?■  is 

used.  Dynamic  Programming,  then,  can  be  used  to  compute  the  result  of  the  recurrence 
in  time  quadratic  in  A'.  Further,  since  ?•  is  the  exponent  in  the  equation  for  QASi'.S), 
the  time  is  that  needed  to  compute  the  power,  which  is  logarithmic  in  r. 


We  may  ask  where  this  approach  is  likely  to  fail.  The  real  trouble  for  the  method  is 
regions  where  there  exist  potential  matches,  but  the  true  feature  is  either  hidden  or  was 
not  detected.  Such  regions  will  give  positive  evidence  for  the  hypothesis,  even  though 
the  correct  featixre  is  not  there.  In  these  cases,  I  generously  assigned  p{M\H)  —  1.  As  a 
result,  there  may  be  many  high-ranked  hypotheses  instead  of  a  few.  This  situation  seems 
likely  for  point  features,  since  spurious  points  can  arise  almost  anywhere.  P'or  extended 
features,  on  the  other  hand,  such  as  line  segments  and  segments  of  curves,  it  is  much  le.ss 
likely  for  a  long  feature  to  randomly  fall  in  an  uncertainty  region,  and  so  the  chance  of 
the  true  feature  being  covered  up  while  random  ones  appear  is  expected  to  be  small. 
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6.3  Summary 


The  goal  ot  the  last  two  sections  was  to  provide  a  means  for  distinguishing  a  few.  most- 
probable  hyi^otheses.  using  the  extended  features  of  a  model.  For  ease  of  use.  1  will 
summarize  the  method  for  line  segments.  It  should  be  straightforward  to  apply  the 
method  to  points,  since  they  are  a  simpler  case.  I  chose  line  segments  since  points  are 
less  useful  for  verification,  and  line  segments  are  used  more  commoidy. 

Given  a  three-])oint  hypothesis,  project  the  model  line  segments  into  the  image  and 
compute  their  line  uncertainty  regions,  making  sure  to  expand  out  the  boundaries  by 
f.  In  detail,  for  each  endpoint  of  a  model  line  segment,  first  compute  its  uncertainty 
circle:  The  center  point  is  at  the  nontinal  point  and  the  radius  r  ecjuals  the  maximum 
distance  from  the  nominal  point  to  one  of  the  8*  =  5T2  sample  points,  plus  <.  The 
line  uncertaint}-  region  is  defined  by  the  uncertainty  circles  for  the  endpoints  and  their 
common  outer  tangents  (Fig.  4-1).  Next,  search  the  uncertainty  regions  to  see  which 
ones  have  candidate  matches.  Use  the  method  of  Chapter  4  to  compute  the  volumes  U 
of  each  line  uncertainty  region  and  the  volume  1/  of  the  image.  .Also,  let  .s  be  the  total 
number  of  image  features,  let  r  =  s  —  .3.  and  let 


p(H)  = 


3 


To  use  the  approach  of  .Section  6.1.  next  calculate  the  line  selectivities  using  // 
Then  compute 

piM\H)  ^  rl 

1>(M\H)  (r  -  k-)b'ir2- 
from  which  the  likelihood  of  the  hypothesis  is 

1 


_v_ 

V,' 


H\H2  ■  •  •  Pk- 


piH\M)  = 


1  1  _ 1) 
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To  uso  instead  the  approaeli  of  Section  G.2.  define  the  recurrenre  relation. 

>’i . ^’a)  =  I 


Q,  iSi\ SOi  1  -  (ir(Si  -  SV;  .‘"1 . ^’a-1  ))  +  S'l . '^k-\ 

if  k-  >  1. 

QASr.S,) 

I  0 


if  k-  =  1  and  S\  <  Sj. 
otherwise. 


with 


Using  the  recurrence,  compute  \  i . I’/,  ).  Then  let 

/>(//|If)  =  i-r/.(V/:l,.....U) 

and.  finally,  the  likelihood  of  the  hypothesis  is 

1 


p{H\M)  = 


l+p(M\H){l/p{H)- 


6.4  Precomputing  the  Likelihoods 


For  flat  models,  it  is  known  that  the  size  of  the  uncertainty  region  for  a  predicted  model 
feature  does  not  change  with  viewpoint,  that  is.  the  size  does  not  change  as  different  image 
points  are  hypothesized  to  match  the  same  triple  of  model  points.  For  solid  models,  it 
may  be  the  case  that  the  size  of  an  uncertainty  region  changes  only  a  little  with  viewpoint, 
if  the  model  is  not  very  elongated.  In  this  case,  it  would  be  possible  to  pre-compute  the 
uncertainty  regions  for  each  triple  of  model  points.  In  addition,  for  each  such  triple, 
the  likelihoods  could  be  computed  in  advance  for  different  subsets  of  the  corresponding 
propagated  regions.  Then  the  model  triples  could  be  ordered  in  advance  according  to 
how  likely  they  are  of  having  a  subset  of  propagated  regions  with  a  high  likelihood  of 
being  correct.  Despite  the  possibilities,  it  must  first  be  determined  how  sensitive  are  the 
uncertainty  regions  for  out-of-plane  model  features  to  changing  viewpoint. 
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6.5  Discussion 

In  deciding  on  a  three-point  match,  file  main  mechanism  we  are  banking  on  is  (hat  it  is 
unlikely  for  features  to  arise  at  random  in  the  uncertainty  regions.  .And  so  the  more  model 
features  for  which  we  find  potential  matches,  the  more  likely  it  is  that  the  hy])olhesis  is 
correct.  For  the  approach  of  Section  (i.l.  note  that  generally  the  ratio  in  Eciuation  6.0 
decreases  as  k,  the  number  of  model  features  for  which  candidate  matches  were  found, 
increases.  From  Equation  6.1.  this  causes  p{H\M)  to  increase,  as  is  desired. 

There  is  a  secondary  effect  that  realizes  that  finding  candidate  matches  for  many 
model  features  ma\  not  imply  that  the  hypothesis  is  correct.  In  particular,  if  matches 
are  found  within  uncertainty  regions  that  are  very  large,  then  the  matches  could  just  as 
easily  have  arisen  randomly.  It  is  important  to  know  when  we  are  in  such  a  situation, 
and  to  reduce  our  confidence  in  the  hypothesis.  This  effect  depends  on  the  sizes  of 
the  uncertainty  regions,  which  depend  on  which  three  points  from  the  model  are  being 
used  in  the  hypothesis  and  where  the  model  is  being  viewed  from.  For  the  approach  of 
Section  6.1.  larger  uncertainty  regions  cause  the  //,  in  Equation  6.6  to  increase.  From 
Equation  6.1,  thi.s  causes  p(H\M)  to  decrease,  as  is  desired. 

.A  tertiary  effect  on  the  likelihood  of  a  hypothesis  is  the  chance  that  the  three  model 
jroints  projected  to  their  hypothesized  corresponding  image  points.  .Although  it  may  seem 
related,  this  issue  is  orthogonal  to  the  issue  of  the  effect  on  the  sizes  of  the  uncertainty 
regions  due  to  where  the  model  is  viewed  from.  To  see  this,  note  that  if  the  model  triple 
is  equally  likely  to  project  to  any  image  triple,  it  could  still  be  that  which  image  triple  it 
projects  to  makes  a  big  difference  in  the  sizes  of  the  uncertainty  regions.  If  we  suppose 
the  model  triple  is  equally-likely  to  be  seen  from  any  direction,  then  it  may  be  that 
some  image  triples  are  very  unlikely  to  arise,  despite  the  fact  that  every  image  triple  is 
possible.  The  reason  this  issue  is  important  is  that  it  may  be  possible  to  have  a  match 
for  several  model  features  that  is  unlikely  to  have  arisen  at  random,  but  at  the  same  time 
the  model  triple  has  almost  a  zero  chance  of  projecting  to  its  hypothesized  image  triple. 
Using  p(H).  the  analysis  above  gives  us  a  way  of  trading  off  these  effects. 


Chapter  7 
Conclusion 


This  thesis  has  four  niaiii  coiitributioiis.  The  first  is  a  geometric  understaiuliiig  of  a  fun¬ 
damental  problem  in  comi^uter  recognition,  namely,  the  solution  for  3D  pose  from  three 
corresponding  ])oints  under  weak-perspective  projection  (C’hapter  2).  .X  new  solution 
to  the  problem  was  given,  and  the  situations  where  there  is  no  solution  and  where  the 
solution  is  unstable  were  described.  In  addition,  the  new  solution  was  put  in  perspective 
with  previous  solutions,  and  the  three  most  related  earlier  solutions  were  presented  in 
detail  and  compared. 

In  addition.  Chapter  2  showed  how  the  image  position  of  an  unmatched  model  [joint 
can  be  computed  efficiently  using  the  .«otution  for  3D  pose.  In  particular,  (’hapter  2  gave 
an  expression  for  the  fourth  point  image  [josition  that  did  not  involve  going  through 
a  model-to-image  transformation,  but  instead  computed  the  position  directly  from  the 
distances  between  the  three  matched  points.  This  is  important  for  alignment-style  recog¬ 
nition.  since  the  image  positions  of  the  unmatched  model  [Joints  are  computed  many 
times  while  searching  for  the  correct  pose  of  the  model. 

The  second  major  contribution  of  this  thesis  is  an  error  analysis  of  point  features 
for  alignment-style  recognition  of  3D  models  from  2D  images  (Chapter  3).  The  earlier 
analysis  of  [CIrimson92a]  was  conservative  in  its  bounds  on  the  propagated  uncertainty, 
and  Chapter  3  showed  we  can  do  better.  In  fact,  the  analysis  in  Chapter  3  is  almost 
always  a  solution,  which  means  its  bounds  are  exact,  notably  except  where  the  3D  pose 
solution  is  iidierently  unstable.  Chapter  3  showed  pictures  of  what  the  true  uncertainty 
regions  look  like  when  the  bounds  are  not  exact.  In  these  cases,  the  bounds  conservatively 
overestimate  the  exact  bounds. 
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CUAPTIM  7.  ('OSCIA  SfOS 


Even  tliougli  tli('  orror  propagation  in  ('ha])ter  2  is  gt'iierally  acrurato, 

t  lu'  toc  Iini(iue  lias  tlio  disadvanl  agt'  of  hoing  nninerical.  Novert  lioloss.  so  was  tlie  tinly 
|)r('vions  enor  propagation  techni<jn<“.  Mor<'ov<*r.  for  most  r('('ognit ion  prohlonis.  tlio  tiino 
to  compiiti'  till'  solution  is  effort ivoly  constant,  as  tliongli  tlu'  solution  were  analytic. 

.\notlier  contribution  of  tliis  thesis  is  a  fornnila  for  the  selectivity  of  line  features 
(('hapti'r  f).  file  selectivity  of  a  feature  can  be  used  tt>  infer  the  exi)ected  jX'rforniance 
of  recognition  systems,  it  can  also  ix'  used  to  set  a  threshold  on  how  much  of  a  model 
must  be  identified  in  an  imag<'  before  the  obj«‘ct  is  recogniziul  (Chapter  b).  lo  date,  a 
selectivity  formula  for  line  featuri's  has  been  provided  lor  ri'cognition  involving  21)  nuxlels 
and  21)  data,  and  for  21)  models  and  21)  data  [Clrimsonbl].  d’he  formula  derivc'd  here  is 
th<'  first  for  recognition  involving  21)  models  and  21)  data. 

The  fourth  major  contribution  is  a  formula  for  the  likelihood  of  a  hypothesized  three- 
point  match  ((’ha|)ter  (i).  The  formula  applies  to  point  or  line  features,  and  relit's  on 
their  associated  uncertainty  regions.  The  formula  is  intended  to  be  used  actively  riming 
recognition  to  quickly  filter  liypotheses  that  have  little  su])port  from  the  image. 

These  four  contributions  tie  together  well  for  building  a  fast  and  robust  alignment 
system.  The  uncertainty  analysis  provides  the  correct  minimal  search  regions  to  guar¬ 
antee  that  no  correct  hypotheses  art*  lost,  which  makes  the  recognition  insensitiv<‘  to 
false  negatives.  Further.  th<>  uncertainty  regions  can  be  com{)Uted  (jnickly  using  the  er¬ 
ror  propagation  techni(|ue  and  tin'  fast  solution  for  the  image  i)osition  of  an  unmatched 
model  point.  Once  comi)uted.  the  uncertainty  regions  usually  aix'  small  enough  to  Ix' 
searched  rapidly  for  candidate  image  features.  Then,  using  the  likelihood  formula,  the 
current  hyi)othesis  can  be  evaluated. 


Chapter  8 
Future  Work 


Having  theoretically  studied  t he  alignment  system  proposed  in  ('hapter  1.  the  next  stej) 
is  to  build  an  alignment  system  that  uses  just  the  extended  features  of  a  model  to  seh*ct 
l)est  hyjwtheses.  The  system  would  be  based  on  geometric  features,  particularly  points 
and  line  segments.  Furthermore,  the  system  would  be  comi)ared  to  other  hyi)othesize- 
and-test  techniques  that  also  use  .'U)  models  and  2D  images,  notably  [l.oweS')]  and  [Hut- 
tenloclier88]. 

.\nother  worthwhile  study  would  l)e  to  build  the  comiilete  models  suggested  in  Chap¬ 
ter  1.  This  recpiires  obtaining  complete  dD  edge  maps  and  extracting  extended  features 
from  them.  Civen  the  edge  maps,  it  would  be  u.seful  to  show  that  they  can  be  used  to 
reliably  verify  the  presence  or  absence  of  the  model  when  the  model  i)ose  is  known  u|)  to 
uncertainty  in  the  data. 

.Another  i)roblem  is  to  discover  the  correct  shapes  and  distributions  of  the  uncertainty 
in  image  features,  instead  of  just  bounding  them.  .Jacobs  observed  that  simply  adjusting 
the  size  of  the  bounded  error  threshold  makes  a  big  difference  in  the  effect i\'eness  of  his 
grouping  system  [.]acobs89].  It  would  be  useful,  then,  to  be  able  to  set  this  threshold 
automatically.  More  generally,  it  is  expected  that  the  errors  in  features  differ  significantly 
across  images,  as  well  as  across  a  single  image,  and  that  they  are  dependent.  It  would 
be  of  interest  to  study  the  feature  detection  process  from  image  formation  on  up  to  see 
how  errors  can  alter  the  features  of  a  model. 

Lastly,  the  propo.sed  system  e.xpects  a  minimal  amount  of  grouping  to  be  j)erformed. 
Grouping  is  an  area  that  has  both  received  much  attention  and  has  much  potential  for 
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imprcncmciil .  Manx  approadics  att<'iii|)l  U>  <i«>  (>roiij)iiig  with  just  line  scffiin'iil s.  I)iil 
aitt'r  line  srsnuMils  lia\<'  Ix't'ii  ('xt  racf  «‘<1.  t(K>  iimcli  iiiformat  ioti  lias  ln-cii  lost.  I  Ik’io  an- 
many  tci  lmi(|iu‘s  tor  sogirn'iit iiig  imagos  into  |■(■giolls  hasi'd  on  intensities  [ilaralic  kS.'ij. 
hut  ty|)i(  all,\  t  In'se  siinplx  '  lusti'f  similar  int<'nsiti<‘s  ami  do  not  take  adxantageol  liigluM 
li'xi'l  sliape  iiilormation  at  edge's.  (Ironping  should  lx-  ix'ifonm'd  using  inte'iisitx  images 
together  with  the  ir  edges. 


Appendix  A 

Rigid  Transform  between  3 
Corresponding  3D  Points 


1  Ills  apixMulix  comixitc's  a  rigid  transl'orm  hot  ween  two  sets  of  thre(>  (•orr<'S])on(iing  points 
using  riglit-iianded  coordinate  systems  huilt  separately  on  each  set  of  tlir<>(‘  [)oints.  A 
right-handed  system  is  determined  l>y  an  origin  point,  o.  and  three  perpendicular  >mit 
\ectc)i-.s.  (d.  r.  ir).  (liven  three  points  in  space,  po.  p^.  p^.  we  can  construct  a  right-handed 
system  as  follows:  Let  p^)^  =  /7|  —  /7„  and  /7(,j  =  p^  —  i>o-  I  hen  let 


lh> 

Po\ 

Pi)2  —  (i>i)2  ■  /*()! 

(I  X  f 

I.et  (f7i ;  d| .  ci.  n'l )  and  (aj;  u,.  r^.  u’z )  he  t  he  coordinate  .systems  so  defined  for  the  original 
and  camera-cemered  points,  respectively. 

(liven  a  coordinate  system  (<7:  d.  r. «’).  a  rigi<l  transformation  that  takes  a  point  in 
world  coordinates  to  a  point  in  that  coordinate  system  is  given  hy  (R. /).  where 

R  =  [d  V  jc].  /  —  (7 

(see  for  example  [( ’raigoo] );  the  transformed  /7  is  R/7  +  t.  Then  we  can  bring  a  point  /7 
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tVoiu  tlu'  original  system  to  tiie  workl  and  then  to  the  camera-centered  system  using 
R-2  (^1  ^  [p  ~  A  !  2  —  R2R1  ^  P  A  tj  ~~  R-2R-1  ‘  ^  1 

wiu'ia' 

Rl  =  [{7i  e,  ic,].  ii=ox 

R2  =  [T/^  ic.>].  1 2  =  02- 

(’ons('(|nent ly  a  rigid  translormat ion  (R./)  tiiat  aligns  the  two  coordinate  systems  is 

R  =  RgRi^  .  f  —  ^2  ~  RgR-i^  f  i-  ( A.  1 ) 


Appendix  B 

Solving  for  the  Scale  Factor 

B.l  Biquadratic  for  the  Scale  Factor 


This  apj)endix  sliows 

-  4, -  4;)  =  (..‘(/tl,  -  Rl,  -  Rl,)  -  (42  -  4,  -  42))'  (B.l) 

is  equivalent  to  a  biquadratic  in  .s. 

Expanding  Equation  B.l, 

4  -  sHRiA,  +  RiA, )  +  ‘liAi)  = 

-HRi,  +  Ki  -  +  Ki  -  Ru)i4,  +  4..  -  42) 

+(4i  +  42  “  42)^ 


{14, "52  -  (flS,  +  RL  -  RiA) 

-2,4'  (24,4,2  +  iRlAm  -  (4,  +  4,  -  "i2)(4,  +  <2  -  42)) 

+  (44142  -  (4i  +  4,  -  'luf)  =  0 

as^  -  2/>s^  +  r  =  0. 
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APPESDIX  B.  SOLVIXa  FOR  THE  SCALE  FACTOR 


where 

e  =  iR^RL-iR^ARl,- 

c  =  4f/o,f4  ~  (^^01  4"  ^^02  ~  ^^12)^- 


B.2  Two  Solutions  for  Scale 


The  following  lemma  com[)letes  the  proof  of  Proposition  1: 
Lemma  :  Let  /  be  either  (|^)  <>*'  (^)  ■  Then 

of  -  2b f  +  r  <  0. 


Proof; 


of  -  2b f  +  r 


(B.2) 


=  }(  Rq\ I{u2  sin  o)^  f  — 

2  ^2(  R^idf  +  ~  2/?oi  Roidm f/02  ^os  o ros  i.')^  f  + 

4(r/oif/o2 sin  t’)^.  from  Equations  2. IS.  2.19.  and  2.20 


=  -cos^o)f- 

(R^\<Iq2  T  ^^02^01  ~  2/?oi/fo2doido2  ros  oeos  i')f  + 

442(1  -^os'c))  (B.:{) 

SupjK)se  that  /  =  (-^j  •  Then  B.2  becomes 


424i 


cos^  0  +  2 


^024i  di)2 


R^ 


cos  0  cos  tr 


01 
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I  i>2  i2  I 

=  -  1  7^  OOS  O  -  —  COS  I  • 

\  iloi  «02 


Suppose  iiiste?-'  that  /  =  (l^)  •  Then  B.3  i)ecoines 


I  I  ^^01^02  2  ,  C>01<'02“0) 

I  I - —  eos  o  +  2 - - - cos  O  cos 


=  -iRnA 


~Ro2 


01  "02 


<1^02 

-~  cos  o  — -—  cos  t,' 

i^02  Ru 


lui 


Eitlier  wav,  a  f^  —  'Ibf  4-  c  <  0. 

□ 


B.3  One  Solution  for  Scale 

Ill  the  "one  solution"  case,  we  wish  to  know  when  and  if  —  oc  =  0  holds.  Using  the 
result  of  Appendix  B.o.  this  means  that 

-l(/foif/o2)'^  —  2cos(<Z)  +  v)t  4-  l)  —  'icoslo  -  v)i  +  l)  =  0. 

For  this  to  hold,  either 

—  2cos(<4  4- t’)/ 4- I  =  0  or  /^  —  2 cos(0  —  t')/ 4- 1  =  0. 

Solving  for  /  gives 

/  =  cos{(p  4-  C’)  ±  /sin(c)  4-  U)  or  Z  =  cos(c>  —  i/’)  ±  / sin(0  —  (,’)•  (B.-I) 

where  i  =  \/—E  (’onseciuently.  there  are  real  values  of  Z  that  make  —  nr  =  0  only  if 
sin(o  4-  V')  =  0  or  sin(0  —  r)  =  0.  These  situations  occur  when  O  =  ±V'  and  o  =  ±i'  4-  tt. 
Substituting  into  Equation  B.4  gives  that  —  uc  =  0  iff  both  o  =  ±t'  or  o  =  ±c’  +  tt 
and  Z  =  1.  where  Z  =  1  is  the  same  as  ^ 

«01  r^02 
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B.4  No  Solutions  for  Scale 

This  appendix  shows  that  there  always  exists  a  solution  to  the  biquadratic  by  showing 
that  6“  —  (ic  >  0.  From  Appendix  B."). 

/T  -  ac  =  l(/?uidu2)^  -  •2cos(c)+  v)t  +  l)  -  2cos(o  -  v)t  +  l] 

>  MRmdoi)^  (/'  -  21  +  l)  (C  -  -It  +  l) 

>  0 

B.5  Simplifying  6“  —  ac 

In  this  apj)endix.  I  derive  that 

—  (IC  =  -l(/?oi<'/o2)*  —  2cos(c>  -1-  L')t  +  1^  —  2cos(c>  —  l')t  +  l)  .  (B.5) 

where 

^  _  Ro2(!oi 

hVoni  Equations  2.1<S.  2.19,  and  2.20. 

a  =  4(/?oi/i’o2sin  o)^ 

b  =  2( +  /?02^0l  -  2/?oi  Rozdo]  (lo2  fos  O  cos  t’) 

"  =  4((/(jir/o2  sin  t’)^ 

Then 

=  4(/?o.2^j'oi  -  d/?o2doi/?Oldo2CO.sOCO.Sr’  +  2/?oi/?o24i^/o2  + 

4/?0]  Ro2dli(lo2  fOS^  ci  COS'^  f  -  4/?'„,</o2^02'^^01  cos  OCOS  II'  + 

(IC  =  16/?o]  Rfy2(loi  do2  d>  sin^  m' 

b^  —  ac  —  4  (^Rq2^oi  ~  4/?q2C?oi  cos  <j cos  t’4- 


B.5.  SIMPLIFYING  B‘  -  .4C- 
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(2  +  4  cos^  ocos^  V  —  4  sin^  o^in^  ^01^02^^01^02  " 

-1/?0i42  Bo2<IoI  cos  O  cos  +  R^iilui) 

=  -l(Roi(Io2)^  {f*  —  4  cos  o cos  cl  *  +  (2  +  4  cos^  ocos^  t’  -  4  siii^  osiii^ 

4  cos  ocos  ti’/  +  1 ) .  where  /  =  (  /?02<4)i  )/(Rm(lu2) 

=  (f^  -  2(cos(o  +  c)  +  cos(c)  -  t’))  ^ 

(2  +  4  cos(C)  +  (.')  cos(o  —  c))1^  —  2  (cos(o  +(.')  +  cos(o  —  c))t  +  l) 

=  4(/?oif/o2)'‘  -  2cos(o  +  c)t  +  1)  (/^  -  2cos(o  -  i')l  +  l) 


APPENDIX  li.  SOLVING  FOR  THE  SCALE  FACTOR 


Appendix  C 

Generating  Random  Image  and 
Model  Points 

This  appendix  describes  how  I  generated  random  triples  of  image  points,  random  triples 
of  model  points,  and  random  point  models. 

C.l  Random  Image  Triples 

Image  triples  were  formed  by  randomly  selecting  three  2D  locations  from  an  image:  the 
image  had  dimensions  454  x  576.  I  selected  image  points  within  a  margin  of  20  pixels 
from  the  boundary.  The  reason  for  the  margin  is  that  in  Experiment  1  (Section  3.2).  1 
discarded  propagated  uncertainty  regions  that  overlapj)ed  the  l)oundary.  In  order  to  save 
time.  I  used  the  margin  to  avoid  generating  such  regions.  This  basically  assumes  that 
image  points  close  to  the  boundary  can  be  ignored. 

In  addition  to  the  constraint  from  the  margin,  another  restriction  I  applied  was  to 
pick  image  points  that  were  at  least  25px  apart  and  at  most  250px  apart.  The  minimum 
distance  is  used  to  avoid  degenerate  point  triples,  and  the  maximum  distance  is  used  to 
reflect  the  expected  size  of  an  object  found  in  an  image.  To  get  three  points  that  were 
between  25px  and  250px  apart.  I  began  by  placing  the  first  point  at  the  origin.  (0.0).  To 
get  a  ])oint  at  most  250px  away  from  the  first,  the  second  point  was  chosen  at  random 
from  a  scpiare  centered  at  the  origin  of  side  2  *  250  +  1  =  501  px.  This  step  was  repeated 
until  a  point  was  at  least  25px  away  and  at  most  250px  away  from  the  first  point  was 
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selertecl.  The  third  |)oint  was  repeatedly  chosen  at  random  from  the  same  s(|nare  until 
a  ])oint  at  least  ‘ihp'X  and  at  most  250px  from  both  of  the  first  two  points  was  selected. 
This  gave  an  arbitrary  triangle  within  the  given  distance  bounds. 

In  order  to  allow  the  triangle  to  arise  anywhere  in  the  image,  the  triangle  was  then 
randomly  translated  by  putting  the  first  point  at  a  location  randomly  chosen  from  within 
the  margin  of  the  image,  until  a  translation  was  found  that  left  all  three  points  within 
the  margin. 


C.2  Random  Model  Triples 

(liven  a  list  of  model  points,  which  could  come  from  a  random  model  or  a  true  model, 
first  two  different  points  in  the  list  were  selected  randomly.  Then  a  third  point  was 
repeatedly  selected  at  random  until  a  j)oint  was  found  that  was  non-collitiear  with  the 
first  two.  Three  points  were  considered  to  be  colli.iear  if  the  triangle  fortned  by  the  three 
had  any  angle  greater  than  175°. 


C.3  Random  Models 

.411  the  generated  models  had  ten  points.  Ten  was  chosen  because  it  is  a  low  bound  on 
the  number  of  points  in  a  model,  or.  equi valent i.\.  the  number  of  propagated  regions  per 
trial.  I  w'anted  a  low  bound  in  order  to  conservatively  estimate  how  well  the  uncertainty 
circles  Ht.  A  low  bound  leads  to  generating  more  |)ropagated  regions  with  different  poses. 
Trying  more  poses  increases  the  chance  of  hitting  cases  where  the  uncertainty  circles  are 
fit  poorly. 

To  reflect  the  final  appearance  of  the  model  in  the  image,  the  model  points  w’ere  all 
chosen  to  be  wdthin  25px  and  250px  apart.  Note  that  the  initial  scale  of  the  model  is 
irrelevant,  since  .scale  is  computed  in  the  po.se  solution.  To  get  a  3D  model,  the  first  point 
was  put  at  the  origin.  Then  the  other  model  points  w'ere  selected  at  random  from  a  cube 
centered  at  the  origin  with  side  501  px.  In  addition,  each  new  model  point  was  repeatedly 
chosen  until  it  was  at  least  25px  and  at  most  250px  from  all  the  current  model  points. 
Ah  with  scale,  the  initial  translation  of  the  model  is  arbitrary,  since  translation  is  solved 
for  when  the  pose  is  computed. 


Appendix  D 

Computing  Areas  of  the  True 
Uncertainty  Regions 


This  a])|)eiKlix  describes  how  tlie  true  uncertainty  regions  are  coniputecl  from  a  mode] 
and  three  matched  model  aiul  image  points.  First  each  mo<le]  point  is  tested  for  whether 
it  is  in  the  plai\e  of  the  three  matched  model  points.  If  so.  its  area  is  computed  from  the 
known  analytic  solution  for  this  case  [.Iacobs9l]. 

In  general,  the  model  points  will  not  lie  in  the  plane  of  the  matched  model  points. 
In  these  cases,  the  true  regions  are  computed  by  uniformly  sampling  twenty-five  points 
along  the  circle  boundaries  of  the  three  matched  image  points.  This  gives  ‘2ry^  =  15625 
samples  for  each  ])ropagated  uncertainty  region.  To  obtain  the  area,  first  the  propagated 
sample  points  are  \,'ritten  to  an  image.  Then  the  outer  boundary  defined  by  the  points 
in  the  image  is  traversed  in  a  four-connected  walk.  Lastly,  all  the  pixels  inside  this  outer 
boundary  are  counted  to  get  the  area.  Observe  that  this  method  can  cau.se  the  true  area 
to  be  overestimated  because  the  jjixels  inside  the  four-connected  boundary  can  include 
eight-connected  pixels  that  are  not  part  of  the  region. 

There  are  two  solutions  for  each  pair  of  model  and  image  triples,  which  correspond 
to  a  reflection  about  a  plane  parallel  to  the  image  (Chapter  2).  In  the  pose  solution  of 
Chapter  2.  H\  and  H2  represent  the  differences  in  the  c  coordinates  between  the  first 
model  point  and  the  second  and  third  model  points,  respectively:  the  differences  for  the 
reflected  solution  are  therefore  —Hi  and  —Hz-  To  distinguish  the  two  sets  of  points 
corresponding  to  the  two  weak-perspective  solutions.  1  use  the  nominal  values  of  Hi  and 
Hz.  which  occur  when  the  matched  image  points  are  at  their  nominal  locations.  If  the 
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noiniiial  //|  is  larger.  1  take  all  tlie  solutions  with  the  same  sign  for  Hi  as  being  Iroiii 
the  same  region.  1  tlo  the  opposite  if  the  nominal  IT  is  larger.  For  the  most  part, 
this  method  works  to  .separate  the  two  regions  as  long  as  tliey  do  not  overlajj.  It  the 
propagated  regions  do  overlap,  there  really  is  one  region,  and  this  metlioil  will  cause  it 
to  split. 


Appendix  E 

Areas  and  Volumes  of  Line 
Uncertainty  Regions 


E.l  True  Area  of  a  Line  Uncertainty  Region 


CJiveii  a  line  segment  of  known  orientation  and  length,  tin*  area  of  the  nnec'rtainty  region 
in  Fig.  l-‘2  can  !)e  eoinputed  hv  moving  the  line  segment  ]>er])e!ulienlar  to  its  orientation. 
This  is  shown  in  Fig.  d-o  parametrized  l)v  i/.  This  section  comi)utes  th<*  uncertainty 
region  area.  For  simplicity,  I  assume  tin*  unc<‘rtainty  circles  for  the  endpoints  do  iiot 
intersect . 

For  a  given  offset  a.  we  are  interested  in  the  distance  between  the  outer  intersection 
points  of  the  line  and  the  circles.  From  the  figure,  this  distance  equals 


II  (•'■i-.Vi)  -  II  = 


■f  z  -  -^'i  _  m  -  y\ 
rosO  sin  0 


Putting  the  origin  at  the  smaller  circle,  the  equations  of  the  line  and  circles  are 


— .r  sin  0  +  //  cos  0  =  ti 

(F.l) 

.r^  + 

(F.2) 

(x-Lf  +  y^  =  /?' 

(E.:}) 
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Assuniiiig  cos  ^  ().  \v(‘  can  solve  lor  //  in  K<inalion  K.l  ainl  substitute  intcj  K((uation  K.L’: 


+ 


=  ;• 


II  4-  r  sin  0  \  ' 

7^)  j 

cos‘  0  A  {ii  +  ./■  sin  Oy  —  /•'  cos"  0  =  0 
A  '111  sin  0  ./•  +  (/■  —  /•■  cos'  0  =  0 


,(■  =  —usiwOA  \/ r-  —  (/-'  cos 

.('i  =  — usinty—  \//-’  —  (/-  <'os /y.  Iroiii  lig.  t-"). 


Note  that  the  (lisciiiiiinant  is  non-negative,  sinc<‘  |//|  <  /■  IVoni  l  ig.  l-o.  Next.  snl)stilnte 
i'or  //  IVoii)  l'((nation  I'-.l  into  K<|nation 


•r-  /.)'  + 


II  +  .)■  sin  /y 


cos 


sO 


IP 


=>  (./■  —  L)'  cos'  0  +  (n  +  .rsin^)'^  —  tP  cos'  0  =  0 

==»  ,(•"  -f  2(  —  I.  cos"  0  A  ii  sin  0 ).r  +  ii'  A  I.'  cos"  0  —  IP  cos'  0  =  () 

=i>  ./■  =  /,  cos"  0  ~  II  sin  0  ±  R-  —  [  I.  sin  0  +  ii )-  cos  0 

.r  ,  =  l.cos'  0  —  iis\nO  +  \J R'  ~  (  /-  sin  0  +  n  )"  <'i)sO.  Iron)  Fig.  1  ■'). 

.\gain.  t  he  discriininant  is  non-n<‘gat  iv(*;  I  he  maxiinnin  value  of  ii  =  inin(  i\  R  —  1.  sin  0] 

(see  Figs.  1-.")  and  and  so 


u  <  R  —  L  sin  0 


II  +  L  sin  0  <  R 

( 1/  +  Ls\i\0)'  <  R' .  since  |t/|  <  /)'. 


(Jiveti  .f’l  and  .r^. 


- - =  /,  cos  B  -f  \J R-  —  [ti  A  L  sin  0)^  —  i  -^  —  lE  (  F.  I ) 

cos  0  ^ 

Fix'  aix'a  .d  of  tlx*  shaded  region  in  Fig.  1-2  e<iuals  the  integral  of  F(|ualion  f..4  from 
II  =  — to  II  =  min(/’. /f—  /,sinf^).  if  (lx*  region  exists.  The  region  exists  if  the  image 
segment's  orientation  is  within  the  bounds  of  llie  line  nncf'itainty  region,  that  is.  if 


/:,  i.  Tin  h  MITA  OF  A  LIXT  I  SCTHTAIMA  HTCIOS 


I.  sin  0  <  H  +  r  (  Fiii.  1-1 ).  1  lui:- 


I  nnii(  ■  JV- /- s.m  ■'>  t  / 


^  /.  cos  I)  -f-  y  — 


II  -f  I.AllO)-  —  V'  C'  —  »■)  ill 

if’  1.  sin  f>  <  /(’  +  r. 
ot  lierwisf. 


I  liis  is  not  the  area  we  are  inteta'sled  in.  liowc'ver.  Instead,  wc'  want  tins  area  sluiink 
by  the  len»tli  of  tin'  image  segment.  Let  I  i>e  the  fengtli  of  the  image  segment,  lo 
comi)Ut<'  the  desired  aia'a.  snhtract  I  from  tin'  term  being  integrated.  Ktiuation  K.l.  In 
a<ldition.  we  must  change  the  u|)i)er  limit  of  the  integrat ir)n.  since  it  is  constrained  by  /. 
In  |)articular.  we  lu'i'd  to  ktiow  if  and  where  'erm  being  integrat<'d  crosses  xero.  which  is 


—  t  +  /,  cos  II  \/  H~  —  [II  -f-  I.  sin  ll]~  —  V  c"  —  ii~  —  d- 


1  his  eijuiition  h'ads  to  a  <|uadratic  in  a: 


1 +  1  In"  I",  —  /'■  —  0 


/.  sin  0 
/.  cos  (I  —  I 

—  21 1.  cos  0 
■_»(  /.  cos  0-1) 


(K.d) 


~  />'l  A'  >  it  y/ 1  ^  I  "t  1  I'  ’  —  A'j 
"  =  — ^ 

A'l  +  I 

I  h('  tertii  ireiiig  integrated  crosses  zero  if  the  disniminant  is  non-negative.  Ihere  are 
two  solutions  because  sciuaring  was  used  in  tlie  algebra  to  obtain  Eciuation  K.7  from 

K(juation  r..(i.  11  tlu'  discriminatit  is  non-negative,  let  u"  be  the  ii  from  Kciuation  K.ld 

that  satislies  Kf|uation  K.b.  and  let  iimn\  =  min(j/".r. /?  +  c  —  /.sinf^f:  otherwise  h  t 

''max  =  minfe.  II  +  r  —  I.  sin  0).  Then  the  area  of  translations  is 


rr  (-1  +  1-  rosll  +  y//?‘  -  (»  Isinfl"]^  -  -  in)  dll 

if  L  sin  0  <  H  +  r. 

0  otherwise. 


(K.ll) 
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E.2  Integrating  Areas  to  Volumes 

I'l  —  I \  R  +  <•  -\-  [.  cos  0  —  ( )2/'  (10  —  I ( R  +  r  —  ( yir  <10  +  j  'li  L  cos  0  <10 

=  [  R  +  r  —  ( yirO  + '2r  L  An  0  (E-l'J) 

I'j  =  I {  R  +  r  S-  L  <'os  (I  —  ( )(R  I-  —  I,  sin  0)  <10 

—  I  (  R  A  I-  —  ( ){  R  A  r)  <10  —  J  ( R  -y  I-  —  ( )L  sin  0  A  j  [R  A  i)L  cos  0  <10 

—  I  I.'  cos  0  sin  0  <10 

=  {RAr-  (){R  A  r]0  A(R  A  r  -  ()L  cos  0  A  {R  A  r)  L  An  0  -  0  (}•.];{) 

c:,  =  j{-lR-i  )2r  <10  =  {1R- I  yjrO  ( E.  1  1 ) 

r,  =-  IviR  -  ()(R  A  r  -  LAnO)<IO  =  j  CIR  -  (){R  A  r]<W  -  j  ('IR  -  (]is\n0  dO 

=  CIR  -  (]{R  A  v)0  A  EIR  -  l)LcosO  (E.!-',) 


Appendix  F 

Recurrence  Relation  for  the 
Likelihood  of  a  Hypothesis 


Let  .V,  l)e  the  event  that  none  of  the  uniformly  distributed  image  features  landed  in 
region  /?,.  Then  the  probability  that  at  least  one  image  feature  landed  in  every  region  is 


which  imj)lies 

P( 717177)  =  p(.v,  V---V  A,) 

=  p(.\\  V  •  ■  •  V  Aa._i)  +  M-Va  )  -  /d(  A]  V  ■  •  •  V  A\._1 )  A  A7) 

=  p(Ai  V  •  •  •  V  A'a-,  )  +  />(  A7)  ( 1  -  p(A'i  V  ■  •  ■  V  A7-1  |A'a  ))  (F.l ) 

/»( V  •  •  •  V  A7)  is  a  function  of  the  uncertainty  region  sizes.  .S',-  for  /  =  1.2 . k.  the 

maximum  size.  S'l,  and  the  number  of  uniformly  distributed  features.  For  points,  the 
sizes  are  gi\es  by  the  areas  of  the  uncertainty  regions,  and  for  lines  the  sizes  are  given  by 
the  volumes.  To  make  the  dependency  explicit,  define 

r/r  ( Si ;  .s’, . Sk )  p{  .V,  V  •  •  •  V  .\'k ) 

Therefore  in  Etpiation  F.l.  p(A'i  V  •••  V  A*._,)  =  (jriSp.Si . S’,_._,). 

Next,  let  us  consider  p(A’i  V  •  •  •  V  A7-i|A7).  If  event  .\7  occurs,  that  is.  if  no  features 

land  in  the  kth  region,  then  all  of  the  features  are  distributed  over  the  rest  of  the  image. 
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so  t  hat 


/i(.V,  V  •  ■  •  V  -V*._,1.V*.)  =  </,(>'/  -  .SV;  .s', . ,s'^_, ) 


Lastly.  is  tlic  probability  that  all  the  I'oaturos  missetl  tlu'  kth  n'gioii.  which 

(Hpials  (1  —  •  Define 


QASi.SO 

Plugging  into  Equation  F.l.  />(.\/|//)  is  given  by  r/,. ( .s'/ :  .s', . s'^.).  which  is  deter 

mined  bv  the  recurrence  relation. 


(lv{Si:  A I . s'^.)  = 


(  QriF,:  S,)(  1  -  */,  (>■/  -  A,:  .s', . s',_, ))  +  .s', . s',_, ) 

if  k  >  1. 

Qr(Fi:S\)  if  =  1  and  .s',  <  .s'/. 

0  otherwi.so. 


Qr{Sr.S)={\-~ 


where 
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